SlideShare a Scribd company logo
Introduction to Apache Kafka
2016.06.14
skerrien
samuel.kerrien@gmail.com
About Me
● 14 years Java Developer / Tech lead / ScrumMaster
● 3 years Data Engineer (Hadoop, Hive, Pig, Mahout)
skerrien
samuel.kerrien@gmail.com
Meetup Outline
● Purpose of Kafka
● Architecture
● Demo: Zookeeper
● More Kafka Internals
● Demos: writing Clients
● Discussions
If data is the lifeblood of high technology,
Apache Kafka is the circulatory system in use at LinkedIn.
-- Todd Palino
Source: https://engineering.linkedin.com/kafka/running-kafka-scale
How Kafka Came To Be At LinkedIn
Source: http://www.infoq.com/presentations/kafka-big-data [Neha Narkhede]
How Kafka Came To Be At LinkedIn
Source: http://www.infoq.com/presentations/kafka-big-data [Neha Narkhede]
A Week @ LinkedIn
630,516,047 msg/days (avg per broker)
7,298 msg/sec (avg per broker)
Source: http://www.confluent.io/kafka-summit-2016-ops-some-kafkaesque-days-in-operations-at-linkedin-in-2015 [Joel Koshy]
Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Requirements
○ Low throughput
○ Low latency
○ Durability
● Replacement for JMS
● Decoupling producer/consumer
● Kafka
○ Better throughput
○ Partitioning
○ Replication
○ Fault tolerance
Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Requirements
○ Very high volume
● Track user activity
○ Page view
○ Searches
○ Actions
● Goals
○ Real time processing
○ Monitoring
○ Load into Hadoop
■ Reporting
Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Requirements
○ Very high volumes
● Feed of operational data
○ VMs
○ Apps
Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Log file collection on servers
● Similar to Scribe or Flume
○ Equally goof performance
○ Stronger durability
○ Lower end-to-end ltency
Kafka Use Cases
Messaging
Web Site Activity Tracking
Metrics
Log Aggregation
Stream Processing
Source: https://kafka.apache.org/08/uses.html
● Stage of processing = topic
● Companion framework
○ Storm
○ Spark
○ ...
How Other Companies Use Kafka
LinkedIn: activity streams, operational metrics
Yahoo: real time analytics (peak: 20Gbps compressed data), Kafka Manager
Twitter: storm stream processing
Netflix: real time monitoring, event processing pipelines
Spotify: log delivery system
Airbnb: event pipelines
. . .
Kafka Architecture
Brokers
Producer
Consumer Consumer Consumer Consumer
Producer Producer Producer
Topic
Topic
Topic
Topic
Topic
Topic Zookeeper
offset
Kafka Controller
One broker take the role of Controller which manages:
● Partition leaders
● State of partitions
● Partition reassignments
● Replicas
Kafka characteristics
Fast
Scalable
Durable
Distributed
● Single broker
○ Serve 100s MB/s
○ 1000s clients
Kafka characteristics
Fast
Scalable
Durable
Distributed
● Cluster as data backbone
● Expanded elastically
Kafka characteristics
Fast
Scalable
Durable
Distributed
● Messages persisted on disk
● TB per broker with no performance impact
● Configurable retention
Kafka characteristics
Fast
Scalable
Durable
Distributed
● Cluster can server larger streams than a single
machine can
Anatomy Of A Message
MagicCRC Key Length Key Msg Length MsgAttributes
4 bytes 1 byte 1 byte 4 bytes K bytes 4 bytes M bytes
Zookeeper Architecture
ZK
Node
ZK
Node
ZK
Node
ZK
Node
ZK
Node
Zookeeper Service
Client Client Client Client Client Client
/
How Kafka Uses Zookeeper (1/2)
● Broker membership to a cluster
● Election of controller
● Topic configuration (#partitions, replica location, leader)
● Consumer offsets (alternative option since 0.8.2)
● Quotas (0.9.0)
● ACLs (0.9.0)
How Kafka Uses Zookeeper (2/2)
Kafka zNodes Structure
● /brokers/ids/[0...N] --> host:port (ephemeral node)
● /brokers/topics/[topic]/[0...N] --> nPartions (ephemeral node)
● /consumers/[group_id]/ids/[consumer_id] --> {"topic1": #streams, ..., "topicN": #streams} (ephemeral node)
● /consumers/[group_id]/offsets/[topic]/[broker_id-partition_id] --> offset_counter_value (persistent node)
● /consumers/[group_id]/owners/[topic]/[broker_id-partition_id] --> consumer_node_id (ephemeral node)
Zookeeper Demo
● List zNodes
● Create zNode
● Update zNode
● Delete zNode
● Ephemeral zNode
● Watches (data update & node deletion)
Kafka Demo 1
● Install Kafka / Zookeeper (brew install kafka -> Kafka 0.8.2.1 + ZK 3.4.6 )
● Start a single broker cluster
○ Create a topic
○ Create a producer (Shell)
○ Create a consumer (Shell)
● Start a multi-broker cluster
○ Create a topic (partitioned & replicated)
○ Run producer & consumer
○ Kill a broker / check topic status (leader, ISR)
Write Ahead Log / Commit Log
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Write Ahead Log / Commit Log
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Consumer A
Consumer B
Why Commit Log ?
● Records what happened and when
● Databases
○ Record changes to data structures (physical or logical)
○ Used for replication
● Distributed Systems
○ Update ordering
○ State machine Replication principle
○ Last log timestamp defines its state
Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Topic Partitioning
Producer
Application
Consumer
Application
Partition 1
Topic
writes reads
Brokers
Brokers
Partition 2
Partition N
Brokers
Topic Replication
kafka-topics.sh --zookeeper $ZK --create --topic events --partitions 3 --replication-factor 2
P1 P2 P3
Topic
Logical
Broker 1
Broker 2
Broker 3
P1
P1
P2 P3
P2
P3
Physical
Kafka Guaranties
● Messages sent by a producer to a particular topic partition will be appended in the
order they are sent.
● A consumer instance sees messages in the order they are stored in the log.
● For a topic with replication factor N, Kafka will tolerate up to N-1 server failures
without losing any messages committed to a topic.
Durability Guarantees
Producer can configure acknowledgements:
● <= 0.8.2: request.required.acks
● >= 0.9.0: acks
Value Impact Durability
0 Producer doesn’t wait for leader weak
1 (default)
Producer waits for leader.
Leader sends ack when message written to log.
No wait for followers.
medium
all (0.9.0)
-1 (0.8.2)
Producer waits for leader.
Leader sends ack when all In-Sync-Replica have
acknowledged.
strong
Consumer Offset Management
● < 0.8.2: Zookeeper only
○ Zookeeper not meant for heavy write => scalability issues
● >= 0.8.2: Kafka Topic (__consumer_offset)
○ Configurable: offsets.storage=kafka
● Documentation show how to migrate offsets from Zookeeper to Kafka
http://kafka.apache.org/082/documentation.html#offsetmigration
Data Retention
3 ways to configure it:
● Time based
● Size based
● Log compaction based
Broker Configuration
log.retention.bytes={ -1|...}
log.retention.{ms,minutes,hours}=...
log.retention.check.interval.ms=...
log.cleanup.policy={delete|compact}
log.cleaner.enable={ false|true}
log.cleaner.threads=1
log.cleaner.io.max.bytes.per.second=Double.MaxValue
log.cleaner.backoff.ms=15000
log.cleaner.delete.retention.ms=1d
Topic Configuration
cleanup.policy=...
delete.retention.ms=...
...
Reconfiguring a Topic at Runtime
kafka-topics.sh --zookeeper localhost:2181 
--alter --topic my-topic 
--config max.message.
bytes=128000
kafka-topics.sh --zookeeper localhost:2181 
--alter --topic my-topic 
--deleteConfig max.message.bytes
Log Compaction (1/4)
$ kafka-topics.sh --zookeeper localhost:2181 
--create 
--topic employees 
--replication-factor 1 
--partitions 1 
--config cleanup.policy=compact
$ echo '00158,{"name":"Jeff", "title":"Developer"}' | kafka-console-producer.sh 
--broker-list localhost:9092 
--topic employees 
--property parse.key=true 
--property key.separator=, 
--new-producer
$ echo '00223,{"name":"Chris", "title":"Senior Developer"}' | kafka-console-producer.sh 
--broker-list localhost:9092 
--topic employees 
--property parse.key=true 
--property key.separator=, 
--new-producer
Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
Log Compaction (2/4)
$ echo '00158,{"name":"Jeff", "title":"Senior Developer"}' | kafka-console-producer.sh 
--broker-list localhost:9092 
--topic employees 
--property parse.key=true 
--property key.separator=, 
--new-producer
$ kafka-console-consumer.sh --zookeeper localhost:2181 
--topic employees 
--from-beginning
--property print.key=true 
--property key.separator=,
00158,{"name":"Jeff", "title":"Developer"}
00223,{"name":"Chris", "title":"Senior Developer"}
00158,{"name":"Jeff", "title":"Senior Developer"}
Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
Log Compaction (3/4)
$ kafka-topics.sh --zookeeper localhost:2181 
--alter 
--topic employees 
--config segment.ms=30000
<... Wait 30 seconds ...>
$ kafka-console-consumer.sh --zookeeper localhost:2181 
--topic employees 
--from-beginning 
--property print.key=true 
--property key.separator=,
00158,{"name":"Jeff", "title":"Developer"}
00223,{"name":"Chris", "title":"Senior Developer"}
00158,{"name":"Jeff", "title":"Senior Developer"}
Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
Log Compaction (4/4)
$ echo '00301,{"name":"Dan", "title":"Project Manager"}' | kafka-console-producer.sh 
--broker-list localhost:9092 
--topic employees 
--property parse.key=true 
--property key.separator=, 
--new-producer
$ kafka-console-consumer.sh --zookeeper localhost:2181 
--topic employees 
--from-beginning 
--property print.key=true 
--property key.separator=,
00223,{"name":"Chris", "title":"Senior Developer"}
00158,{"name":"Jeff", "title":"Senior Developer"}
00301,{"name":"Dan", "title":"Project Manager"}
Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
...
[2015-06-25 18:24:08,102] INFO [kafka-log-cleaner-thread-0],
Log cleaner thread 0 cleaned log employees-0 (dirty section =
[0, 3])
0.0 MB of log processed in 0.1 seconds (0.0 MB/sec).
Indexed 0.0 MB in 0.1 seconds (0.0 Mb/sec, 90.6% of total
time)
Buffer utilization: 0.0%
Cleaned 0.0 MB in 0.0 seconds (0.0 Mb/sec, 9.4% of total
time)
Start size: 0.0 MB (3 messages)
End size: 0.0 MB (2 messages)
31.0% size reduction (33.3% fewer messages)
(kafka.log.LogCleaner)
Kafka Performance - Theory
● Efficient Storage
○ Fast sequential write and read
○ Leverages OS page cache (i.e. RAM)
○ Avoid storing data twice in JVM and in OS cache (better perf on startup)
○ Caches 28-30GB data in 32GB machine
○ Zero copy I/O using IBM’s sendfile API (https://www.ibm.com/developerworks/library/j-zerocopy/)
● Batching of messages + compression
● Broker doesn’t hold client state
● Dependent on persistence guaranties request.required.acks
Kafka Benchmark (1/5)
● 0.8.1
● Setup
○ 6 Machines
■ Intel Xeon 2.5 GHz processor with six cores
■ Six 7200 RPM SATA drives (822 MB/sec of linear disk I/O)
■ 32GB of RAM
■ 1Gb Ethernet
○ 3 nodes for brokers + 3 for ZK and clients
Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Kafka Benchmark (2/5)
Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Usecase Messages/sec
Throughput
(MB/sec)
1 producer thread, no replication
6 partitions, 50M messages, payload: 100 bytes
821,557 78.3
1 producer thread, 3x asynchronous replication (ack=1) 786,980 75.1
1 producer thread, 3x synchronous replication (ack=all) 421,823 40.2
3 producers, 3x asynchronous replication 2,024,032 193.0
Producer throughput:
Kafka Benchmark (3/5)
Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Sustained producer throughput
Kafka Benchmark (4/5)
Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Usecase Messages/sec
Throughput
(MB/sec)
1 Consumer 940,521 89.7
3 Consumer (1 per machine) 2,615,968 249.5
Consumer throughput from 6 partitions, 3x replication:
Kafka Benchmark (5/5)
Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
End-to-end latency:
● 2ms (median)
● 3ms (99th percentile)
● 14ms (99.9th percentile)
Kafka Demo 2 - Java High Level API
● Start a multi broker cluster + replicated topic
○ Write Java Partitioned Producer
○ Write Multi-Threaded Consumer
● Unit testing with Kafka
Kafka Versions
0.8.2 (2014.12)
● New Producer API
● Delete topic
● Scalable offset writes
0.9.x (2015.10)
● Security
○ Encryption
○ Kerberos
○ ACLs
● Quotas (client rate control)
● Kafka Connect
0.10.x (2016.03)
● Kafka Streams
● Rack Awareness
● More SASL features
● Timestamp in messages
● API to better manage Connectors
Starting with Kafka: Jay Kreps’ Recommendations
● Start with a single cluster
● Only a few non critical, limited usecases
● Pick a single data format for a given organisation
○ Avro
■ Good language support
■ One schema per topic (message validation, documentation...)
■ Supports schema evolution
■ Data embeds schema
■ Make Data Scientists job easier
■ Put some thoughts into field naming conventions
Source: http://www.confluent.io/blog/stream-data-platform-2/
Conclusions
● Easy to start with for a PoC
● Maybe not so easy to build a production system from scratch
● Must have serious monitoring in place (see Yahoo, Confluent, DataDog)
● Vibrant community, fast pace technology
● Videos of Kafka Summit are online: http://kafka-summit.org/sessions/
https://github.com/samuel-kerrien/kafka-demo
Introduction to apache kafka
Ad

More Related Content

What's hot (20)

Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
techmaddy
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
confluent
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
NexThoughts Technologies
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
confluent
 
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
confluent
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
Oracle Korea
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
Codership Oy - Creators of Galera Cluster
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
Gwen (Chen) Shapira
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
confluent
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
The best of Apache Kafka Architecture
The best of Apache Kafka ArchitectureThe best of Apache Kafka Architecture
The best of Apache Kafka Architecture
techmaddy
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
confluent
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
confluent
 
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
The Easiest Way to Configure Security for Clients AND Servers (Dani Traphagen...
confluent
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
Oracle Korea
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
Gwen (Chen) Shapira
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
confluent
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
Edureka!
 

Viewers also liked (19)

Going event drive + Kafka a RabbitMQ
Going event drive + Kafka a RabbitMQGoing event drive + Kafka a RabbitMQ
Going event drive + Kafka a RabbitMQ
harcek
 
101 ways to configure kafka - badly
101 ways to configure kafka - badly101 ways to configure kafka - badly
101 ways to configure kafka - badly
Henning Spjelkavik
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
Ducas Francis
 
The rise of microservices - containers and orchestration
The rise of microservices - containers and orchestrationThe rise of microservices - containers and orchestration
The rise of microservices - containers and orchestration
Andrew Morgan
 
Kafkaesque days at linked in in 2015
Kafkaesque days at linked in in 2015Kafkaesque days at linked in in 2015
Kafkaesque days at linked in in 2015
Joel Koshy
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
Todd Palino
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
Ivan Glushkov
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Mongo+java (1)
Mongo+java (1)Mongo+java (1)
Mongo+java (1)
MongoDB
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Andrew Morgan
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Amy W. Tang
 
An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of concept
ETLSolutions
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Going event drive + Kafka a RabbitMQ
Going event drive + Kafka a RabbitMQGoing event drive + Kafka a RabbitMQ
Going event drive + Kafka a RabbitMQ
harcek
 
101 ways to configure kafka - badly
101 ways to configure kafka - badly101 ways to configure kafka - badly
101 ways to configure kafka - badly
Henning Spjelkavik
 
Seattle kafka meetup nov 2015 published siphon
Seattle kafka meetup nov 2015 published  siphonSeattle kafka meetup nov 2015 published  siphon
Seattle kafka meetup nov 2015 published siphon
Nitin Kumar
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
Ducas Francis
 
The rise of microservices - containers and orchestration
The rise of microservices - containers and orchestrationThe rise of microservices - containers and orchestration
The rise of microservices - containers and orchestration
Andrew Morgan
 
Kafkaesque days at linked in in 2015
Kafkaesque days at linked in in 2015Kafkaesque days at linked in in 2015
Kafkaesque days at linked in in 2015
Joel Koshy
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
Todd Palino
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
Ivan Glushkov
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Mongo+java (1)
Mongo+java (1)Mongo+java (1)
Mongo+java (1)
MongoDB
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Powering Microservices with MongoDB, Docker, Kubernetes & Kafka – MongoDB Eur...
Andrew Morgan
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Lightbend
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Amy W. Tang
 
An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of concept
ETLSolutions
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Ad

Similar to Introduction to apache kafka (20)

Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Instaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationInstaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney Presentation
Ben Slater
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
confluent
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini
 
Instaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney PresentationInstaclustr Kafka Meetup Sydney Presentation
Instaclustr Kafka Meetup Sydney Presentation
Ben Slater
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Monal Daxini
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
Fabrizio Fortino
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
confluent
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Ad

Recently uploaded (20)

Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 

Introduction to apache kafka

  • 1. Introduction to Apache Kafka 2016.06.14 skerrien [email protected]
  • 2. About Me ● 14 years Java Developer / Tech lead / ScrumMaster ● 3 years Data Engineer (Hadoop, Hive, Pig, Mahout) skerrien [email protected]
  • 3. Meetup Outline ● Purpose of Kafka ● Architecture ● Demo: Zookeeper ● More Kafka Internals ● Demos: writing Clients ● Discussions
  • 4. If data is the lifeblood of high technology, Apache Kafka is the circulatory system in use at LinkedIn. -- Todd Palino Source: https://engineering.linkedin.com/kafka/running-kafka-scale
  • 5. How Kafka Came To Be At LinkedIn Source: http://www.infoq.com/presentations/kafka-big-data [Neha Narkhede]
  • 6. How Kafka Came To Be At LinkedIn Source: http://www.infoq.com/presentations/kafka-big-data [Neha Narkhede]
  • 7. A Week @ LinkedIn 630,516,047 msg/days (avg per broker) 7,298 msg/sec (avg per broker) Source: http://www.confluent.io/kafka-summit-2016-ops-some-kafkaesque-days-in-operations-at-linkedin-in-2015 [Joel Koshy]
  • 8. Kafka Use Cases Messaging Web Site Activity Tracking Metrics Log Aggregation Stream Processing Source: https://kafka.apache.org/08/uses.html ● Requirements ○ Low throughput ○ Low latency ○ Durability ● Replacement for JMS ● Decoupling producer/consumer ● Kafka ○ Better throughput ○ Partitioning ○ Replication ○ Fault tolerance
  • 9. Kafka Use Cases Messaging Web Site Activity Tracking Metrics Log Aggregation Stream Processing Source: https://kafka.apache.org/08/uses.html ● Requirements ○ Very high volume ● Track user activity ○ Page view ○ Searches ○ Actions ● Goals ○ Real time processing ○ Monitoring ○ Load into Hadoop ■ Reporting
  • 10. Kafka Use Cases Messaging Web Site Activity Tracking Metrics Log Aggregation Stream Processing Source: https://kafka.apache.org/08/uses.html ● Requirements ○ Very high volumes ● Feed of operational data ○ VMs ○ Apps
  • 11. Kafka Use Cases Messaging Web Site Activity Tracking Metrics Log Aggregation Stream Processing Source: https://kafka.apache.org/08/uses.html ● Log file collection on servers ● Similar to Scribe or Flume ○ Equally goof performance ○ Stronger durability ○ Lower end-to-end ltency
  • 12. Kafka Use Cases Messaging Web Site Activity Tracking Metrics Log Aggregation Stream Processing Source: https://kafka.apache.org/08/uses.html ● Stage of processing = topic ● Companion framework ○ Storm ○ Spark ○ ...
  • 13. How Other Companies Use Kafka LinkedIn: activity streams, operational metrics Yahoo: real time analytics (peak: 20Gbps compressed data), Kafka Manager Twitter: storm stream processing Netflix: real time monitoring, event processing pipelines Spotify: log delivery system Airbnb: event pipelines . . .
  • 14. Kafka Architecture Brokers Producer Consumer Consumer Consumer Consumer Producer Producer Producer Topic Topic Topic Topic Topic Topic Zookeeper offset
  • 15. Kafka Controller One broker take the role of Controller which manages: ● Partition leaders ● State of partitions ● Partition reassignments ● Replicas
  • 16. Kafka characteristics Fast Scalable Durable Distributed ● Single broker ○ Serve 100s MB/s ○ 1000s clients
  • 17. Kafka characteristics Fast Scalable Durable Distributed ● Cluster as data backbone ● Expanded elastically
  • 18. Kafka characteristics Fast Scalable Durable Distributed ● Messages persisted on disk ● TB per broker with no performance impact ● Configurable retention
  • 19. Kafka characteristics Fast Scalable Durable Distributed ● Cluster can server larger streams than a single machine can
  • 20. Anatomy Of A Message MagicCRC Key Length Key Msg Length MsgAttributes 4 bytes 1 byte 1 byte 4 bytes K bytes 4 bytes M bytes
  • 22. How Kafka Uses Zookeeper (1/2) ● Broker membership to a cluster ● Election of controller ● Topic configuration (#partitions, replica location, leader) ● Consumer offsets (alternative option since 0.8.2) ● Quotas (0.9.0) ● ACLs (0.9.0)
  • 23. How Kafka Uses Zookeeper (2/2) Kafka zNodes Structure ● /brokers/ids/[0...N] --> host:port (ephemeral node) ● /brokers/topics/[topic]/[0...N] --> nPartions (ephemeral node) ● /consumers/[group_id]/ids/[consumer_id] --> {"topic1": #streams, ..., "topicN": #streams} (ephemeral node) ● /consumers/[group_id]/offsets/[topic]/[broker_id-partition_id] --> offset_counter_value (persistent node) ● /consumers/[group_id]/owners/[topic]/[broker_id-partition_id] --> consumer_node_id (ephemeral node)
  • 24. Zookeeper Demo ● List zNodes ● Create zNode ● Update zNode ● Delete zNode ● Ephemeral zNode ● Watches (data update & node deletion)
  • 25. Kafka Demo 1 ● Install Kafka / Zookeeper (brew install kafka -> Kafka 0.8.2.1 + ZK 3.4.6 ) ● Start a single broker cluster ○ Create a topic ○ Create a producer (Shell) ○ Create a consumer (Shell) ● Start a multi-broker cluster ○ Create a topic (partitioned & replicated) ○ Run producer & consumer ○ Kill a broker / check topic status (leader, ISR)
  • 26. Write Ahead Log / Commit Log 0 1 2 3 4 5 6 7 8 9 10 11 12 13
  • 27. Write Ahead Log / Commit Log 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Consumer A Consumer B
  • 28. Why Commit Log ? ● Records what happened and when ● Databases ○ Record changes to data structures (physical or logical) ○ Used for replication ● Distributed Systems ○ Update ordering ○ State machine Replication principle ○ Last log timestamp defines its state Source: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
  • 29. Topic Partitioning Producer Application Consumer Application Partition 1 Topic writes reads Brokers Brokers Partition 2 Partition N Brokers
  • 30. Topic Replication kafka-topics.sh --zookeeper $ZK --create --topic events --partitions 3 --replication-factor 2 P1 P2 P3 Topic Logical Broker 1 Broker 2 Broker 3 P1 P1 P2 P3 P2 P3 Physical
  • 31. Kafka Guaranties ● Messages sent by a producer to a particular topic partition will be appended in the order they are sent. ● A consumer instance sees messages in the order they are stored in the log. ● For a topic with replication factor N, Kafka will tolerate up to N-1 server failures without losing any messages committed to a topic.
  • 32. Durability Guarantees Producer can configure acknowledgements: ● <= 0.8.2: request.required.acks ● >= 0.9.0: acks Value Impact Durability 0 Producer doesn’t wait for leader weak 1 (default) Producer waits for leader. Leader sends ack when message written to log. No wait for followers. medium all (0.9.0) -1 (0.8.2) Producer waits for leader. Leader sends ack when all In-Sync-Replica have acknowledged. strong
  • 33. Consumer Offset Management ● < 0.8.2: Zookeeper only ○ Zookeeper not meant for heavy write => scalability issues ● >= 0.8.2: Kafka Topic (__consumer_offset) ○ Configurable: offsets.storage=kafka ● Documentation show how to migrate offsets from Zookeeper to Kafka http://kafka.apache.org/082/documentation.html#offsetmigration
  • 34. Data Retention 3 ways to configure it: ● Time based ● Size based ● Log compaction based Broker Configuration log.retention.bytes={ -1|...} log.retention.{ms,minutes,hours}=... log.retention.check.interval.ms=... log.cleanup.policy={delete|compact} log.cleaner.enable={ false|true} log.cleaner.threads=1 log.cleaner.io.max.bytes.per.second=Double.MaxValue log.cleaner.backoff.ms=15000 log.cleaner.delete.retention.ms=1d Topic Configuration cleanup.policy=... delete.retention.ms=... ... Reconfiguring a Topic at Runtime kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config max.message. bytes=128000 kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --deleteConfig max.message.bytes
  • 35. Log Compaction (1/4) $ kafka-topics.sh --zookeeper localhost:2181 --create --topic employees --replication-factor 1 --partitions 1 --config cleanup.policy=compact $ echo '00158,{"name":"Jeff", "title":"Developer"}' | kafka-console-producer.sh --broker-list localhost:9092 --topic employees --property parse.key=true --property key.separator=, --new-producer $ echo '00223,{"name":"Chris", "title":"Senior Developer"}' | kafka-console-producer.sh --broker-list localhost:9092 --topic employees --property parse.key=true --property key.separator=, --new-producer Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
  • 36. Log Compaction (2/4) $ echo '00158,{"name":"Jeff", "title":"Senior Developer"}' | kafka-console-producer.sh --broker-list localhost:9092 --topic employees --property parse.key=true --property key.separator=, --new-producer $ kafka-console-consumer.sh --zookeeper localhost:2181 --topic employees --from-beginning --property print.key=true --property key.separator=, 00158,{"name":"Jeff", "title":"Developer"} 00223,{"name":"Chris", "title":"Senior Developer"} 00158,{"name":"Jeff", "title":"Senior Developer"} Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
  • 37. Log Compaction (3/4) $ kafka-topics.sh --zookeeper localhost:2181 --alter --topic employees --config segment.ms=30000 <... Wait 30 seconds ...> $ kafka-console-consumer.sh --zookeeper localhost:2181 --topic employees --from-beginning --property print.key=true --property key.separator=, 00158,{"name":"Jeff", "title":"Developer"} 00223,{"name":"Chris", "title":"Senior Developer"} 00158,{"name":"Jeff", "title":"Senior Developer"} Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/
  • 38. Log Compaction (4/4) $ echo '00301,{"name":"Dan", "title":"Project Manager"}' | kafka-console-producer.sh --broker-list localhost:9092 --topic employees --property parse.key=true --property key.separator=, --new-producer $ kafka-console-consumer.sh --zookeeper localhost:2181 --topic employees --from-beginning --property print.key=true --property key.separator=, 00223,{"name":"Chris", "title":"Senior Developer"} 00158,{"name":"Jeff", "title":"Senior Developer"} 00301,{"name":"Dan", "title":"Project Manager"} Source: http://www.shayne.me/blog/2015/2015-06-25-everything-about-kafka-part-2/ ... [2015-06-25 18:24:08,102] INFO [kafka-log-cleaner-thread-0], Log cleaner thread 0 cleaned log employees-0 (dirty section = [0, 3]) 0.0 MB of log processed in 0.1 seconds (0.0 MB/sec). Indexed 0.0 MB in 0.1 seconds (0.0 Mb/sec, 90.6% of total time) Buffer utilization: 0.0% Cleaned 0.0 MB in 0.0 seconds (0.0 Mb/sec, 9.4% of total time) Start size: 0.0 MB (3 messages) End size: 0.0 MB (2 messages) 31.0% size reduction (33.3% fewer messages) (kafka.log.LogCleaner)
  • 39. Kafka Performance - Theory ● Efficient Storage ○ Fast sequential write and read ○ Leverages OS page cache (i.e. RAM) ○ Avoid storing data twice in JVM and in OS cache (better perf on startup) ○ Caches 28-30GB data in 32GB machine ○ Zero copy I/O using IBM’s sendfile API (https://www.ibm.com/developerworks/library/j-zerocopy/) ● Batching of messages + compression ● Broker doesn’t hold client state ● Dependent on persistence guaranties request.required.acks
  • 40. Kafka Benchmark (1/5) ● 0.8.1 ● Setup ○ 6 Machines ■ Intel Xeon 2.5 GHz processor with six cores ■ Six 7200 RPM SATA drives (822 MB/sec of linear disk I/O) ■ 32GB of RAM ■ 1Gb Ethernet ○ 3 nodes for brokers + 3 for ZK and clients Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  • 41. Kafka Benchmark (2/5) Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Usecase Messages/sec Throughput (MB/sec) 1 producer thread, no replication 6 partitions, 50M messages, payload: 100 bytes 821,557 78.3 1 producer thread, 3x asynchronous replication (ack=1) 786,980 75.1 1 producer thread, 3x synchronous replication (ack=all) 421,823 40.2 3 producers, 3x asynchronous replication 2,024,032 193.0 Producer throughput:
  • 42. Kafka Benchmark (3/5) Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Sustained producer throughput
  • 43. Kafka Benchmark (4/5) Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Usecase Messages/sec Throughput (MB/sec) 1 Consumer 940,521 89.7 3 Consumer (1 per machine) 2,615,968 249.5 Consumer throughput from 6 partitions, 3x replication:
  • 44. Kafka Benchmark (5/5) Source: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines End-to-end latency: ● 2ms (median) ● 3ms (99th percentile) ● 14ms (99.9th percentile)
  • 45. Kafka Demo 2 - Java High Level API ● Start a multi broker cluster + replicated topic ○ Write Java Partitioned Producer ○ Write Multi-Threaded Consumer ● Unit testing with Kafka
  • 46. Kafka Versions 0.8.2 (2014.12) ● New Producer API ● Delete topic ● Scalable offset writes 0.9.x (2015.10) ● Security ○ Encryption ○ Kerberos ○ ACLs ● Quotas (client rate control) ● Kafka Connect 0.10.x (2016.03) ● Kafka Streams ● Rack Awareness ● More SASL features ● Timestamp in messages ● API to better manage Connectors
  • 47. Starting with Kafka: Jay Kreps’ Recommendations ● Start with a single cluster ● Only a few non critical, limited usecases ● Pick a single data format for a given organisation ○ Avro ■ Good language support ■ One schema per topic (message validation, documentation...) ■ Supports schema evolution ■ Data embeds schema ■ Make Data Scientists job easier ■ Put some thoughts into field naming conventions Source: http://www.confluent.io/blog/stream-data-platform-2/
  • 48. Conclusions ● Easy to start with for a PoC ● Maybe not so easy to build a production system from scratch ● Must have serious monitoring in place (see Yahoo, Confluent, DataDog) ● Vibrant community, fast pace technology ● Videos of Kafka Summit are online: http://kafka-summit.org/sessions/ https://github.com/samuel-kerrien/kafka-demo