SlideShare a Scribd company logo
Bringing Kafka Without ZooKeeper
Into Production
Colin McCabe
Principal Engineer
Confluent
About Me
2
I work on Apache Kafka at
Confluent.
Kafka Committer & PMC Member
3
● KRaft Architecture
● Deploying KRaft
● Rolling KRaft Clusters
● Troubleshooting KRaft Clusters
● Upgrading from ZooKeeper
● Roadmap
● Q&A
Table of Contents
KRaft Architecture
4
Removing ZooKeeper
5
Current
ZK-Based
Architecture
New
KRaft-Based
Architecture
How Kafka Uses ZooKeeper
6
Stores most persistent
metadata in ZooKeeper
● Topics
● Partitions
● Configurations
● Quotas
● ACLs
Also uses ZooKeeper for
coordinating cluster
membership.
/brokers/ids/0
/brokers/ids/1
/brokers/topics/foo
/brokers/topics/foo/partitions/0/state
/brokers/topics/foo/partitions/1/state
/brokers/topics/foo/partitions/2/state
...
Controller Startup with ZK
7
Controller Startup with ZK
8
● One broker wins the
election in ZooKeeper to
be the controller
Load full
metadata
Controller Startup with ZK
9
Controller Startup with ZK
10
● UpdateMetadataRequest
● LeaderAndIsrRequest
All
partitions
Problems with ZK-based Controller Startup
11
● Have to load all metadata synchronously on startup:
○ O(num_partitions), O(num_brokers)
○ Controller is unavailable during this time
■ Cold start: cluster unavailable.
■ Controller restart: admin ops and ISR changes unavailable
● Have to send all metadata to all brokers on startup
3 minutes
How KRaft Replaces ZooKeeper
12
● Instead of ZooKeeper, we have an internal
topic named __cluster_metadata
○ Replicated with KRaft
○ Single partition
○ The leader is the active controller
● KRaft: Raft for Kafka
○ The Raft protocol implemented
in Kafka
■ Records committed by a
majority of nodes
○ Self-managed quorum
■ Doesn’t rely on an external
system for leader election
Metadata Records
13
● Binary records containing
metadata
○ KRPC format
○ Auto-generated from
protocol schemas
○ Can also be translated
into JSON for human
readability
● Two ways to evolve format
○ New record versions
○ Tagged fields
● Some records are deltas that
apply changes to existing
state
{
"type": "REGISTER_BROKER_RECORD",
"version": 0,
"data": {
"brokerId": 1,
"incarnationId": "P3UFsWoNR-erL9PK98YLsA",
"brokerEpoch": 0,
"endPoints": [
{
"name": "PLAINTEXT",
"host": "localhost",
"port": 9092,
"securityProtocol": 0
}
],
"features": [],
"rack": null
}
}
Metadata as An Ordered Log
14
TopicRecord(name=foo, id=rtkInsMkQPiEBj6uz67rrQ)
PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=0, …)
PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=1, …)
PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=2, …)
ConfigRecord(name=num.io.threads, value=...)
RegisterBrokerRecord(id=4, endpoints=…, …)
10434
10435
10436
10437
10438
10439
2
1
Controller Startup with KRaft
15
● The blue nodes are
designated KRaft
controllers
Controller Startup with KRaft
16
● The KRaft controllers will
elect a single leader
● The leader will have all
previously committed
records
Controller Startup with KRaft
17
● The newly elected KRaft
controller is ready
immediately
● Brokers fetch only the
metadata they need
● New brokers and brokers that
are behind fetch snapshots
Controller Failover
18
ZK Mode KRaft Mode
• Win controller
election
• Load all topics
and partitions
• Send
LeaderAndIsr +
UpdateMetadata
to all brokers
• Win KRaft
election
• Start handling
requests from
brokers
Rolling Nodes
19
ZK Mode KRaft Mode
Restarted node
begins with no
metadata. Must
wait for full
metadata update
RPCs from
controller.
Restarted node
consults its local
metadata cache,
transfers only what
it doesn’t have,
based on
metadata offset.
Calling an Admin API with ZK
20
● External client connects to
a random broker
Calling an Admin API with ZK
21
● Broker alters ZooKeeper
● Example: create
znodes for new topic
Calling an Admin API with ZK
22
● Znode watch triggers
● Controller loads changes
Calling an Admin API with ZK
23
● Controller pushes out
changes to brokers
● Incremental LeaderAndIsr
● Incremental UpdateMetadata
ZK-based Programming Model
24
Controller Thread
ZooKeeper
Create
/brokers/topics/foo
Request Handler
/brokers/topics
Changed
List
/brokers/topics
Compute
changes
List
/brokers/topics/foo
Problems
25
● What if the controller fails to send an update to a specific broker?
○ Metadata divergence
○ No easy way to know what we have and what we don’t have
● Difficult programming model
○ Multiple writers to ZooKeeper
○ Can’t assume your cache is up-to-date!
● ZK is the bottleneck for all admin operations
○ Often 1 admin operation leads to many ZK operations
○ Blocking
Calling an Admin API with KRaft
26
● External client connects to
a random broker
Calling an Admin API with KRaft
27
● Broker forwards the
request to the active
controller for processing
Calling an Admin API with KRaft
28
● Controller creates
metadata records and
persists them to
__cluster_metadata
Calling an Admin API with KRaft
29
● Once the records have
been committed to the
metadata log, the active
controller returns the
result to the forwarding
broker, which returns it to
the external client
Calling an Admin API with KRaft
30
● Brokers are continuously
fetching metadata from the
active controller.
● They become aware of the
admin changes by reading
the new metadata records.
KRaft-based Programming Model
31
Raft Quorum
Controller
CreateTopicsRequest
Request Handler
Write new
records
Last stable
offset advances
CreateTopicsResponse
KRaft-based Programming Model: Pipelining
32
Raft Quorum
Controller
CreateTopicsRequest
Request Handler
Write new
records
Last stable
offset advances
CreateTopicsResponse
Admin Operations in KRaft
33
● Pull-based metadata propagation model can recover from RPC send
failures easily
● Simpler programming model: single-writer
● Pipelining means that we can have multiple metadata operations in
flight at once
Deploying KRaft
34
Getting KRaft
35
● The latest Apache (and Confluent) releases support both ZK mode and
KRaft mode
○ KRaft mode is supported from the same code base – not a branch
● KRaft mode is not yet recommended for production
○ In Apache 2.8, KRaft was in early access
○ As of Apache 3.0, KRaft is in preview
○ Definitely use the latest version if you are trying it out – many bugs
have been fixed
● ZooKeeper mode will be supported for a while
○ I’ll discuss the release plans in more detail in the Roadmap section of
this talk
Creating a New KRaft Cluster
● First two steps are
new!
36
$ kafka-storage.sh random-uuid
MX1HbXq8TFymY2SYR1hGYg
$ kafka-storage.sh format 
-c ./config/kraft/controller.properties 
--cluster-id MX1HbXq8TFymY2SYR1hGYg
Formatting /tmp/kraft-controller-logs
● Generate a new random uuid with kafka-storage tool
● On each node, use the kafka-storage tool to format storage directories
● Start all nodes
The Role of Formatting
37
● In ZK mode, brokers auto-format storage directories if they are blank on
startup.
● In KRaft mode, we must format each broker and controller before starting
the nodes.
● Why not auto-format?
○ Avoid admin mistakes or filesystem issues that could lose data
■ If a volume is not mounted, the directory may appear empty
■ The same reasons why databases and filesystems require a
formatting step.
○ We need a way to bootstrap security-related dynamic configurations
that the quorum and the brokers need to function
■ SCRAM configurations
■ Dynamic SSL configurations
■ Metadata version
Bootstrapping
38
ZK Mode KRaft Mode
• Start up
ZooKeeper
cluster
• Set SCRAM
users, dynamic
security configs,
etc. in ZK
• Start brokers
• Generate
cluster ID
• Run format tool on all
nodes, specifying
cluster ID, SCRAM,
security configs,
metadata version, etc.
• Start nodes
KRaft Controllers
● A small pool of nodes that we designate ahead of time
○ Hot standbys for the active controller
○ Single log directory for __cluster_metadata
● Sizing is very similar to ZooKeeper
○ Typically 3 controller nodes
○ May use more to provide more resilience
○ Just like with ZK, need to keep a majority of nodes alive
● Can be configured by copying over broker configuration file
○ Set node.roles to controller
○ Set node.id
39
4
0
Combined Mode
Controller processes
in the same JVM as
broker processes
Shared IDs, shared
JVMs, shared nodes
Can have single JVM
Kafka cluster
Deploying KRaft Controllers
Separate Mode
Controller
processes on
separate nodes
Different IDs,
different JVMs,
different nodes
Better isolation,
easier rolls
Separate Mode with
Kubernetes
Bin-Packing
Controllers in
separate k8s pods
Different IDs,
different JVMs,
maybe shared
nodes
Some Considerations for Deploying Controllers
41
● Isolation
● Avoid co-location in busy clusters
● Try to avoid situations where
traffic from outside disrupts
internals
● High Availability
● Try to not to have multiple
controllers in a single failure
domain
● In clusters which span 3
availability zones, put a controller
in each zone
Controller
New Controller Endpoint
42
Broker
Controller
EXTERNAL INTERBROKER
CONTROLLER
● Only exposed on
controllers
● Not an advertised
listener or broker listener
● Should not be accessible
externally
Configuring KRaft Clusters
● node.id
○ Replaces broker.id on both brokers and controllers.
● process.roles
○ broker
○ controller
○ broker,controller
● controller.quorum.voters
○ 1@controller1.9093,2@controller2:9093,3@controller3:9093
○ Statically configured
● controller.listener.names
○ Tell brokers and other controllers how to connect to the controller
○ These are not advertised listeners and cannot appear in
“advertised.listeners” on the broker
○ On the controller they appear in “listeners”
43
Rolling KRaft Clusters
44
Rolling KRaft Controllers
45
● Much lighter roll process
○ No cluster-destabilizing full metadata updates to send
● KRaft controllers can be rolled very quickly compared to brokers
○ They do not manage lots of data directories
○ Make sure that monitoring software can keep up
● KRaft controller software can be upgraded on a separate
schedule
○ Forwards and backwards compatibility with alternate broker
versions
Monitoring Rolls
46
● ZK Mode
○ Under-replicated partitions
○ ActiveControllerCount
○ LeaderAndIsr request time
○ UpdateMetadata request time
○ ZooKeeperExpiresPerSec
● KRaft Mode
○ Under-replicated partitions
■ Shows broker health
○ ActiveControllerCount
○ MetadataOffset
○ SnapshotLag
○ SnapshotSizeBytes
○ MetadataCommitRate
○ MetadataCommitLatency
○ Current Raft state (follower,
leader, candidate, observer)
■ Shows controller health
○ FencedBrokerCount
○ ActiveBrokerCount
Creating New Partitions During Cluster Rolls in ZK
47
● In ZK mode, controller “forgets
about” brokers that are
temporary down during a roll
● In a 3-node cluster, if one node is
rolling, we don’t have enough
nodes to create a partition with
3x replication.
● Must run with 4 nodes, even if
we don’t need all 4.
Rolling!
Replica 1 Replica 2 Replica 3
New
Partition
48
● In KRaft mode, the controller
remembers brokers that are
temporary down during a roll
● In KRaft mode, those brokers
enter “fenced state”
○ Can place new replicas on
these nodes if needed
○ Brokers will be unfenced
once they come up
● Many small clusters can now use
3 nodes rather than 4.
Rolling!
Replica 1 Replica 2 Replica 3
New
Partition
Creating New Partitions During Cluster Rolls in KRaft
Handling Version Skew
49
● During the roll, old and new software versions
must co-exist
● Older brokers can’t understand newer APIs and
protocols
● inter.broker.protocol solves this in the ZK world
○ Controls what RPC protocols brokers use
when communicating with each other
○ Controls what features brokers support
● Examples
○ Whether to use AlterPartitions for ISRs
○ Whether to use topic IDs
○ Whether to use the new version of some API
(ListOffsets, FetchRequest, etc.)
KAFKA_2_4_IV0,
KAFKA_2_4_IV1,
KAFKA_2_5_IV0,
KAFKA_2_6_IV0,
KAFKA_2_7_IV0,
KAFKA_2_7_IV1,
KAFKA_2_7_IV2,
KAFKA_2_8_IV0,
KAFKA_2_8_IV1,
KAFKA_3_0_IV0,
KAFKA_3_0_IV1,
KAFKA_3_1_IV0,
KAFKA_3_2_IV0
Problems with Inter Broker Protocol in ZK
50
● inter.broker.protocol version must be manually configured
○ If it is left out of the configuration file, it defaults to the latest
version, which is probably not what the user wants
● Because inter.broker.protocol is statically configured, it can’t be
changed without restarting all the nodes
○ To upgrade to a new version AND get the new features requires a
“double roll” in ZK mode
● No downgrade support
○ It is safe to downgrade between some pairs of versions, but not
others.
Introducing metadata.version
51
● In KRaft mode, inter.broker.protocol is replaced by metadata.version
● Each new inter.broker.protocol version has a corresponding
metadata.version
● metadata.version is dynamically configured
○ Does not require a roll to change
○ It is changed by invoking a controller API.
○ “Guard rails”
■ The controller will refuse to do the upgrade if some brokers
have not been rolled
● Supports downgrade!
Downgrading metadata.version
52
● Like upgrade, downgrade can be done dynamically from the command
line
● Two kinds of downgrades
○ Safe: no metadata will be lost by the downgrade
○ Unsafe: some metadata may be cleared during the downgrade
● Unsafe downgrades require the operator to provide an override flag
Troubleshooting KRaft Clusters
53
Troubleshooting KRaft Clusters
54
● In KRaft, __cluster_metadata replaces ZooKeeper as the store of
record
● It’s often helpful to examine the metadata log to see what
happened
● All batches come with timestamps
● Offsets are uniform across the cluster
○ A specific record will have the same offset on each
controller and broker
● Several tools for looking at metadata logs
○ kafka-dump-log
○ kafka-metadata-shell
kafka-dump-log
55
$ ./bin/kafka-dump-log.sh 
--cluster-metadata-decoder 
--files /tmp/logs/__cluster_metadata-0/00000000000000000000.log
Dumping /tmp/logs/__cluster_metadata-0/00000000000000000000.log
Starting offset: 0
baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1
[...]
| offset: 0 CreateTime: 1650857270775 keySize: 4 valueSize: 19
sequence: -1 headerKeys: [] controlType: LEADER_CHANGE(2)
baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1
[...]{"type":"REGISTER_BROKER_RECORD","version":0,"data":{"...
Interactive Metadata Shell
56
● Replaces zookeeper-shell in KRaft clusters
● Data sources
○ Snapshot
○ Running controller cluster
● Reads __cluster_metadata log entries into memory
● Constructs a “virtual filesystem” with the cluster’s information
● Commands available: ls, pwd, cd, find, etc.
kafka-metadata-shell
57
$ ./bin/kafka-metadata-shell.sh --snapshot 
/tmp/logs/__cluster_metadata-0/00000000000000000000.log
Loading...
Starting...
[ Kafka Metadata Shell ]
>> ls
brokers configs local metadataQuorum topicIds topics
>> cat /brokers/1/registration
RegisterBrokerRecord(brokerId=1,
incarnationId=tHo3Z8dYSuONV5hA82BVug, brokerEpoch=0,
endPoints=[BrokerEndpoint(name='PLAINTEXT', host='localhost',
port=9092, securityProtocol=0)], features=[], rack=null, fenced=true)
Monitoring the Quorum
58
● Metrics
○ MetadataOffset
○ SnapshotLag
○ SnapshotSizeBytes
○ MetadataCommitRate
○ MetadataCommitLatency
○ Current Raft state (follower,
leader, candidate, observer)
■ Important for controller
health
● DescribeQuorum RPC
○ Leader ID
○ Leader epoch
○ High water mark
○ Current voters
■ Controllers
○ Current observers
■ Brokers
■ Possible metadata shell
instances, etc.
○ Log end offset of all followers
Upgrading from ZooKeeper Mode
59
Upgrading Clusters from ZK Mode
60
ZK mode ZK mode ZK mode
● Initially: all metadata in
ZooKeeper.
● All brokers in ZK mode.
Upgrading Clusters from ZK Mode
61
61
ZK mode ZK mode ZK mode
KRaft
controller
● We add a quorum of new
KRaft controllers to the
cluster
KRaft
controller
KRaft
controller
Upgrading Clusters from ZK Mode
62
62
ZK mode ZK mode ZK mode
● The KRaft controllers elect
a leader from their ranks
and force that leader to be
the cluster leader in ZK.
KRaft
controller
KRaft
controller
KRaft
controller
Upgrading Clusters from ZK Mode
63
63
ZK mode ZK mode ZK mode
● The KRaft controller loads
all metadata from ZK, and
sends out the appropriate
LeaderAndIsr, UMRs.
● Future metadata changes
go into the KRaft quorum.
KRaft
controller
KRaft
controller
KRaft
controller
metadata
Upgrading Clusters from ZK Mode
64
64
ZK mode ZK mode
KRaft
mode
● Now brokers can be rolled
one by one.
● KRaft controller will still
send LeaderAndIsr, etc. to
un-upgraded brokers.
KRaft
controller
KRaft
controller
KRaft
controller
Upgrading Clusters from ZK Mode
65
65
ZK mode
KRaft
mode
KRaft
mode
● This relies on brokers never
directly communicating
with ZK (except to register
themselves)
● KIP-591 Forwarding
KRaft
controller
KRaft
controller
KRaft
controller
Upgrading Clusters from ZK Mode
66
66
KRaft
mode
KRaft
mode
KRaft
mode
● Once all brokers have been
rolled, ZK can be removed.
KRaft
controller
KRaft
controller
KRaft
controller
Roadmap
67
KRaft Project Phases
1. Design
2. Initial Implementation
3. Available for testing
4. Available for production
5. Bridge release
6. ZK removed
68
KRaft Project Phases
69
1. Design
2. Initial Implementation
3. Available for testing
4. Available for production
5. Bridge release
6. ZK removed
Design Phase
70
● During the design phase, we had many upstream discussions about the
goals of the KRaft project (called the KIP-500 project at the time) and how
we would evolve the project to meet them.
● September 2019: vote for “KIP-500: Replace ZooKeeper with a
Self-Managed Metadata Quorum” passes
● June 2020: vote for “KIP-591: Redirect Zookeeper Mutation Protocols to
The Controller” passes (some revisions will be made later)
● August 2020: vote for “KIP-595: A Raft Protocol for the Quorum” passes
● September 2020: vote for “KIP-631: The Quorum-Based Kafka Controller”
passes
● October 2020: vote for “KIP-630: Kafka Raft Snapshots” passes
● Several other minor KIPs filled gaps in our admin APIs which were being
filled by direct ZK access
Initial Implementation Phase
71
● During this phase, we put the ideas from the design phase into practice.
● The implementation was pretty unstable during this time period – many
big changes and fixes were happening all the time.
● There were many minor changes to remove ZK dependencies as well.
● Late 2020: KIP-500 work started in a branch.
● Early 2021: all code merged to trunk
○ New controller code
○ New self-managed Raft quorum implementation
● April 2021: Apache 2.8 released
○ First Apache release with KRaft
○ KRaft in Early Access
● September 2021: Apache 3.0 release
○ First release with KRaft snapshot support
○ Added reassignment and EOS support as well
○ Preview
Testing Phase
72
● This is where we are now.
● KRaft is in preview mode
○ Not recommended for production
○ Available for testing
● January 2022: Apache Kafka 3.1 released
○ Many bug fixes, new tests
○ Some new KRaft metrics
● May 2022: Apache Kafka 3.2 released (soon)
○ New KRaft-based authorizer
Production Phase
73
● Coming soon!
● During this phase, we will recommend KRaft for production use.
○ Also known as “going GA”
● There may still be some feature gaps
○ SCRAM
○ JBOD support
○ Delegation token support
● Most importantly, we will not have support for upgrading from ZK yet in
this phase.
○ Therefore, in this phase, KRaft will be useful for new clusters, but not
for existing ones
Bridge Release
74
● A bridge release is a release that
supports upgrading from ZK mode
to KRaft mode.
○ As described in the section on
upgrading from ZK, this
requires brokers to channel
their admin changes through
the controller, rather than
mutating ZK directly.
● We will make multiple bridge
releases
● We will work hard to close all the
remaining feature gaps during this
phase, so that there are no longer
any users stuck on ZK.
Last Phase: Removing ZK Support
75
● Eventually, we will want to remove ZK support so that we no longer have
to maintain two controller implementations
● Before we can remove ZK support, we have to deprecate it, and then
make a new major release
● Timeline for full removal is TBD
Thank You!
Colin McCabe
cmccabe@apache.org
Ad

More Related Content

What's hot (20)

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
Prateek Maheshwari
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
Adam Kotwasinski
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
ScyllaDB
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
Alexander Korotkov
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Cruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clustersCruise Control: Effortless management of Kafka clusters
Cruise Control: Effortless management of Kafka clusters
Prateek Maheshwari
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
HostedbyConfluent
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
ScyllaDB
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
confluent
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
Alexander Korotkov
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 

Similar to Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Summit London 2022 (20)

Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
Bridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaftBridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaft
HostedbyConfluent
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
Ricardo Lourenço
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
C4Media
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
Kevin Brockhoff
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Raghavendra Prabhu
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
confluent
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC session
Linaro
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
Dimas Prasetyo
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
 
WKSctl: Gitops Management of Kubernetes Clusters
WKSctl: Gitops Management of Kubernetes ClustersWKSctl: Gitops Management of Kubernetes Clusters
WKSctl: Gitops Management of Kubernetes Clusters
Weaveworks
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
Docker, Inc.
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
Weaveworks
 
Kernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver frameworkKernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver framework
Anne Nicolas
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
Bridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaftBridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaft
HostedbyConfluent
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
Ricardo Lourenço
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
C4Media
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
Kevin Brockhoff
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Raghavendra Prabhu
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
confluent
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC session
Linaro
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Plny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practicesPlny12 galera-cluster-best-practices
Plny12 galera-cluster-best-practices
Dimas Prasetyo
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
ScyllaDB
 
WKSctl: Gitops Management of Kubernetes Clusters
WKSctl: Gitops Management of Kubernetes ClustersWKSctl: Gitops Management of Kubernetes Clusters
WKSctl: Gitops Management of Kubernetes Clusters
Weaveworks
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
Docker, Inc.
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
Weaveworks
 
Kernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver frameworkKernel Recipes 2015 - So you want to write a Linux driver framework
Kernel Recipes 2015 - So you want to write a Linux driver framework
Anne Nicolas
 
Ad

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 

Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Summit London 2022

  • 1. Bringing Kafka Without ZooKeeper Into Production Colin McCabe Principal Engineer Confluent
  • 2. About Me 2 I work on Apache Kafka at Confluent. Kafka Committer & PMC Member
  • 3. 3 ● KRaft Architecture ● Deploying KRaft ● Rolling KRaft Clusters ● Troubleshooting KRaft Clusters ● Upgrading from ZooKeeper ● Roadmap ● Q&A Table of Contents
  • 6. How Kafka Uses ZooKeeper 6 Stores most persistent metadata in ZooKeeper ● Topics ● Partitions ● Configurations ● Quotas ● ACLs Also uses ZooKeeper for coordinating cluster membership. /brokers/ids/0 /brokers/ids/1 /brokers/topics/foo /brokers/topics/foo/partitions/0/state /brokers/topics/foo/partitions/1/state /brokers/topics/foo/partitions/2/state ...
  • 8. Controller Startup with ZK 8 ● One broker wins the election in ZooKeeper to be the controller
  • 10. Controller Startup with ZK 10 ● UpdateMetadataRequest ● LeaderAndIsrRequest All partitions
  • 11. Problems with ZK-based Controller Startup 11 ● Have to load all metadata synchronously on startup: ○ O(num_partitions), O(num_brokers) ○ Controller is unavailable during this time ■ Cold start: cluster unavailable. ■ Controller restart: admin ops and ISR changes unavailable ● Have to send all metadata to all brokers on startup 3 minutes
  • 12. How KRaft Replaces ZooKeeper 12 ● Instead of ZooKeeper, we have an internal topic named __cluster_metadata ○ Replicated with KRaft ○ Single partition ○ The leader is the active controller ● KRaft: Raft for Kafka ○ The Raft protocol implemented in Kafka ■ Records committed by a majority of nodes ○ Self-managed quorum ■ Doesn’t rely on an external system for leader election
  • 13. Metadata Records 13 ● Binary records containing metadata ○ KRPC format ○ Auto-generated from protocol schemas ○ Can also be translated into JSON for human readability ● Two ways to evolve format ○ New record versions ○ Tagged fields ● Some records are deltas that apply changes to existing state { "type": "REGISTER_BROKER_RECORD", "version": 0, "data": { "brokerId": 1, "incarnationId": "P3UFsWoNR-erL9PK98YLsA", "brokerEpoch": 0, "endPoints": [ { "name": "PLAINTEXT", "host": "localhost", "port": 9092, "securityProtocol": 0 } ], "features": [], "rack": null } }
  • 14. Metadata as An Ordered Log 14 TopicRecord(name=foo, id=rtkInsMkQPiEBj6uz67rrQ) PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=0, …) PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=1, …) PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=2, …) ConfigRecord(name=num.io.threads, value=...) RegisterBrokerRecord(id=4, endpoints=…, …) 10434 10435 10436 10437 10438 10439 2 1
  • 15. Controller Startup with KRaft 15 ● The blue nodes are designated KRaft controllers
  • 16. Controller Startup with KRaft 16 ● The KRaft controllers will elect a single leader ● The leader will have all previously committed records
  • 17. Controller Startup with KRaft 17 ● The newly elected KRaft controller is ready immediately ● Brokers fetch only the metadata they need ● New brokers and brokers that are behind fetch snapshots
  • 18. Controller Failover 18 ZK Mode KRaft Mode • Win controller election • Load all topics and partitions • Send LeaderAndIsr + UpdateMetadata to all brokers • Win KRaft election • Start handling requests from brokers
  • 19. Rolling Nodes 19 ZK Mode KRaft Mode Restarted node begins with no metadata. Must wait for full metadata update RPCs from controller. Restarted node consults its local metadata cache, transfers only what it doesn’t have, based on metadata offset.
  • 20. Calling an Admin API with ZK 20 ● External client connects to a random broker
  • 21. Calling an Admin API with ZK 21 ● Broker alters ZooKeeper ● Example: create znodes for new topic
  • 22. Calling an Admin API with ZK 22 ● Znode watch triggers ● Controller loads changes
  • 23. Calling an Admin API with ZK 23 ● Controller pushes out changes to brokers ● Incremental LeaderAndIsr ● Incremental UpdateMetadata
  • 24. ZK-based Programming Model 24 Controller Thread ZooKeeper Create /brokers/topics/foo Request Handler /brokers/topics Changed List /brokers/topics Compute changes List /brokers/topics/foo
  • 25. Problems 25 ● What if the controller fails to send an update to a specific broker? ○ Metadata divergence ○ No easy way to know what we have and what we don’t have ● Difficult programming model ○ Multiple writers to ZooKeeper ○ Can’t assume your cache is up-to-date! ● ZK is the bottleneck for all admin operations ○ Often 1 admin operation leads to many ZK operations ○ Blocking
  • 26. Calling an Admin API with KRaft 26 ● External client connects to a random broker
  • 27. Calling an Admin API with KRaft 27 ● Broker forwards the request to the active controller for processing
  • 28. Calling an Admin API with KRaft 28 ● Controller creates metadata records and persists them to __cluster_metadata
  • 29. Calling an Admin API with KRaft 29 ● Once the records have been committed to the metadata log, the active controller returns the result to the forwarding broker, which returns it to the external client
  • 30. Calling an Admin API with KRaft 30 ● Brokers are continuously fetching metadata from the active controller. ● They become aware of the admin changes by reading the new metadata records.
  • 31. KRaft-based Programming Model 31 Raft Quorum Controller CreateTopicsRequest Request Handler Write new records Last stable offset advances CreateTopicsResponse
  • 32. KRaft-based Programming Model: Pipelining 32 Raft Quorum Controller CreateTopicsRequest Request Handler Write new records Last stable offset advances CreateTopicsResponse
  • 33. Admin Operations in KRaft 33 ● Pull-based metadata propagation model can recover from RPC send failures easily ● Simpler programming model: single-writer ● Pipelining means that we can have multiple metadata operations in flight at once
  • 35. Getting KRaft 35 ● The latest Apache (and Confluent) releases support both ZK mode and KRaft mode ○ KRaft mode is supported from the same code base – not a branch ● KRaft mode is not yet recommended for production ○ In Apache 2.8, KRaft was in early access ○ As of Apache 3.0, KRaft is in preview ○ Definitely use the latest version if you are trying it out – many bugs have been fixed ● ZooKeeper mode will be supported for a while ○ I’ll discuss the release plans in more detail in the Roadmap section of this talk
  • 36. Creating a New KRaft Cluster ● First two steps are new! 36 $ kafka-storage.sh random-uuid MX1HbXq8TFymY2SYR1hGYg $ kafka-storage.sh format -c ./config/kraft/controller.properties --cluster-id MX1HbXq8TFymY2SYR1hGYg Formatting /tmp/kraft-controller-logs ● Generate a new random uuid with kafka-storage tool ● On each node, use the kafka-storage tool to format storage directories ● Start all nodes
  • 37. The Role of Formatting 37 ● In ZK mode, brokers auto-format storage directories if they are blank on startup. ● In KRaft mode, we must format each broker and controller before starting the nodes. ● Why not auto-format? ○ Avoid admin mistakes or filesystem issues that could lose data ■ If a volume is not mounted, the directory may appear empty ■ The same reasons why databases and filesystems require a formatting step. ○ We need a way to bootstrap security-related dynamic configurations that the quorum and the brokers need to function ■ SCRAM configurations ■ Dynamic SSL configurations ■ Metadata version
  • 38. Bootstrapping 38 ZK Mode KRaft Mode • Start up ZooKeeper cluster • Set SCRAM users, dynamic security configs, etc. in ZK • Start brokers • Generate cluster ID • Run format tool on all nodes, specifying cluster ID, SCRAM, security configs, metadata version, etc. • Start nodes
  • 39. KRaft Controllers ● A small pool of nodes that we designate ahead of time ○ Hot standbys for the active controller ○ Single log directory for __cluster_metadata ● Sizing is very similar to ZooKeeper ○ Typically 3 controller nodes ○ May use more to provide more resilience ○ Just like with ZK, need to keep a majority of nodes alive ● Can be configured by copying over broker configuration file ○ Set node.roles to controller ○ Set node.id 39
  • 40. 4 0 Combined Mode Controller processes in the same JVM as broker processes Shared IDs, shared JVMs, shared nodes Can have single JVM Kafka cluster Deploying KRaft Controllers Separate Mode Controller processes on separate nodes Different IDs, different JVMs, different nodes Better isolation, easier rolls Separate Mode with Kubernetes Bin-Packing Controllers in separate k8s pods Different IDs, different JVMs, maybe shared nodes
  • 41. Some Considerations for Deploying Controllers 41 ● Isolation ● Avoid co-location in busy clusters ● Try to avoid situations where traffic from outside disrupts internals ● High Availability ● Try to not to have multiple controllers in a single failure domain ● In clusters which span 3 availability zones, put a controller in each zone Controller
  • 42. New Controller Endpoint 42 Broker Controller EXTERNAL INTERBROKER CONTROLLER ● Only exposed on controllers ● Not an advertised listener or broker listener ● Should not be accessible externally
  • 43. Configuring KRaft Clusters ● node.id ○ Replaces broker.id on both brokers and controllers. ● process.roles ○ broker ○ controller ○ broker,controller ● controller.quorum.voters ○ [email protected],2@controller2:9093,3@controller3:9093 ○ Statically configured ● controller.listener.names ○ Tell brokers and other controllers how to connect to the controller ○ These are not advertised listeners and cannot appear in “advertised.listeners” on the broker ○ On the controller they appear in “listeners” 43
  • 45. Rolling KRaft Controllers 45 ● Much lighter roll process ○ No cluster-destabilizing full metadata updates to send ● KRaft controllers can be rolled very quickly compared to brokers ○ They do not manage lots of data directories ○ Make sure that monitoring software can keep up ● KRaft controller software can be upgraded on a separate schedule ○ Forwards and backwards compatibility with alternate broker versions
  • 46. Monitoring Rolls 46 ● ZK Mode ○ Under-replicated partitions ○ ActiveControllerCount ○ LeaderAndIsr request time ○ UpdateMetadata request time ○ ZooKeeperExpiresPerSec ● KRaft Mode ○ Under-replicated partitions ■ Shows broker health ○ ActiveControllerCount ○ MetadataOffset ○ SnapshotLag ○ SnapshotSizeBytes ○ MetadataCommitRate ○ MetadataCommitLatency ○ Current Raft state (follower, leader, candidate, observer) ■ Shows controller health ○ FencedBrokerCount ○ ActiveBrokerCount
  • 47. Creating New Partitions During Cluster Rolls in ZK 47 ● In ZK mode, controller “forgets about” brokers that are temporary down during a roll ● In a 3-node cluster, if one node is rolling, we don’t have enough nodes to create a partition with 3x replication. ● Must run with 4 nodes, even if we don’t need all 4. Rolling! Replica 1 Replica 2 Replica 3 New Partition
  • 48. 48 ● In KRaft mode, the controller remembers brokers that are temporary down during a roll ● In KRaft mode, those brokers enter “fenced state” ○ Can place new replicas on these nodes if needed ○ Brokers will be unfenced once they come up ● Many small clusters can now use 3 nodes rather than 4. Rolling! Replica 1 Replica 2 Replica 3 New Partition Creating New Partitions During Cluster Rolls in KRaft
  • 49. Handling Version Skew 49 ● During the roll, old and new software versions must co-exist ● Older brokers can’t understand newer APIs and protocols ● inter.broker.protocol solves this in the ZK world ○ Controls what RPC protocols brokers use when communicating with each other ○ Controls what features brokers support ● Examples ○ Whether to use AlterPartitions for ISRs ○ Whether to use topic IDs ○ Whether to use the new version of some API (ListOffsets, FetchRequest, etc.) KAFKA_2_4_IV0, KAFKA_2_4_IV1, KAFKA_2_5_IV0, KAFKA_2_6_IV0, KAFKA_2_7_IV0, KAFKA_2_7_IV1, KAFKA_2_7_IV2, KAFKA_2_8_IV0, KAFKA_2_8_IV1, KAFKA_3_0_IV0, KAFKA_3_0_IV1, KAFKA_3_1_IV0, KAFKA_3_2_IV0
  • 50. Problems with Inter Broker Protocol in ZK 50 ● inter.broker.protocol version must be manually configured ○ If it is left out of the configuration file, it defaults to the latest version, which is probably not what the user wants ● Because inter.broker.protocol is statically configured, it can’t be changed without restarting all the nodes ○ To upgrade to a new version AND get the new features requires a “double roll” in ZK mode ● No downgrade support ○ It is safe to downgrade between some pairs of versions, but not others.
  • 51. Introducing metadata.version 51 ● In KRaft mode, inter.broker.protocol is replaced by metadata.version ● Each new inter.broker.protocol version has a corresponding metadata.version ● metadata.version is dynamically configured ○ Does not require a roll to change ○ It is changed by invoking a controller API. ○ “Guard rails” ■ The controller will refuse to do the upgrade if some brokers have not been rolled ● Supports downgrade!
  • 52. Downgrading metadata.version 52 ● Like upgrade, downgrade can be done dynamically from the command line ● Two kinds of downgrades ○ Safe: no metadata will be lost by the downgrade ○ Unsafe: some metadata may be cleared during the downgrade ● Unsafe downgrades require the operator to provide an override flag
  • 54. Troubleshooting KRaft Clusters 54 ● In KRaft, __cluster_metadata replaces ZooKeeper as the store of record ● It’s often helpful to examine the metadata log to see what happened ● All batches come with timestamps ● Offsets are uniform across the cluster ○ A specific record will have the same offset on each controller and broker ● Several tools for looking at metadata logs ○ kafka-dump-log ○ kafka-metadata-shell
  • 55. kafka-dump-log 55 $ ./bin/kafka-dump-log.sh --cluster-metadata-decoder --files /tmp/logs/__cluster_metadata-0/00000000000000000000.log Dumping /tmp/logs/__cluster_metadata-0/00000000000000000000.log Starting offset: 0 baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1 [...] | offset: 0 CreateTime: 1650857270775 keySize: 4 valueSize: 19 sequence: -1 headerKeys: [] controlType: LEADER_CHANGE(2) baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1 [...]{"type":"REGISTER_BROKER_RECORD","version":0,"data":{"...
  • 56. Interactive Metadata Shell 56 ● Replaces zookeeper-shell in KRaft clusters ● Data sources ○ Snapshot ○ Running controller cluster ● Reads __cluster_metadata log entries into memory ● Constructs a “virtual filesystem” with the cluster’s information ● Commands available: ls, pwd, cd, find, etc.
  • 57. kafka-metadata-shell 57 $ ./bin/kafka-metadata-shell.sh --snapshot /tmp/logs/__cluster_metadata-0/00000000000000000000.log Loading... Starting... [ Kafka Metadata Shell ] >> ls brokers configs local metadataQuorum topicIds topics >> cat /brokers/1/registration RegisterBrokerRecord(brokerId=1, incarnationId=tHo3Z8dYSuONV5hA82BVug, brokerEpoch=0, endPoints=[BrokerEndpoint(name='PLAINTEXT', host='localhost', port=9092, securityProtocol=0)], features=[], rack=null, fenced=true)
  • 58. Monitoring the Quorum 58 ● Metrics ○ MetadataOffset ○ SnapshotLag ○ SnapshotSizeBytes ○ MetadataCommitRate ○ MetadataCommitLatency ○ Current Raft state (follower, leader, candidate, observer) ■ Important for controller health ● DescribeQuorum RPC ○ Leader ID ○ Leader epoch ○ High water mark ○ Current voters ■ Controllers ○ Current observers ■ Brokers ■ Possible metadata shell instances, etc. ○ Log end offset of all followers
  • 60. Upgrading Clusters from ZK Mode 60 ZK mode ZK mode ZK mode ● Initially: all metadata in ZooKeeper. ● All brokers in ZK mode.
  • 61. Upgrading Clusters from ZK Mode 61 61 ZK mode ZK mode ZK mode KRaft controller ● We add a quorum of new KRaft controllers to the cluster KRaft controller KRaft controller
  • 62. Upgrading Clusters from ZK Mode 62 62 ZK mode ZK mode ZK mode ● The KRaft controllers elect a leader from their ranks and force that leader to be the cluster leader in ZK. KRaft controller KRaft controller KRaft controller
  • 63. Upgrading Clusters from ZK Mode 63 63 ZK mode ZK mode ZK mode ● The KRaft controller loads all metadata from ZK, and sends out the appropriate LeaderAndIsr, UMRs. ● Future metadata changes go into the KRaft quorum. KRaft controller KRaft controller KRaft controller metadata
  • 64. Upgrading Clusters from ZK Mode 64 64 ZK mode ZK mode KRaft mode ● Now brokers can be rolled one by one. ● KRaft controller will still send LeaderAndIsr, etc. to un-upgraded brokers. KRaft controller KRaft controller KRaft controller
  • 65. Upgrading Clusters from ZK Mode 65 65 ZK mode KRaft mode KRaft mode ● This relies on brokers never directly communicating with ZK (except to register themselves) ● KIP-591 Forwarding KRaft controller KRaft controller KRaft controller
  • 66. Upgrading Clusters from ZK Mode 66 66 KRaft mode KRaft mode KRaft mode ● Once all brokers have been rolled, ZK can be removed. KRaft controller KRaft controller KRaft controller
  • 68. KRaft Project Phases 1. Design 2. Initial Implementation 3. Available for testing 4. Available for production 5. Bridge release 6. ZK removed 68
  • 69. KRaft Project Phases 69 1. Design 2. Initial Implementation 3. Available for testing 4. Available for production 5. Bridge release 6. ZK removed
  • 70. Design Phase 70 ● During the design phase, we had many upstream discussions about the goals of the KRaft project (called the KIP-500 project at the time) and how we would evolve the project to meet them. ● September 2019: vote for “KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum” passes ● June 2020: vote for “KIP-591: Redirect Zookeeper Mutation Protocols to The Controller” passes (some revisions will be made later) ● August 2020: vote for “KIP-595: A Raft Protocol for the Quorum” passes ● September 2020: vote for “KIP-631: The Quorum-Based Kafka Controller” passes ● October 2020: vote for “KIP-630: Kafka Raft Snapshots” passes ● Several other minor KIPs filled gaps in our admin APIs which were being filled by direct ZK access
  • 71. Initial Implementation Phase 71 ● During this phase, we put the ideas from the design phase into practice. ● The implementation was pretty unstable during this time period – many big changes and fixes were happening all the time. ● There were many minor changes to remove ZK dependencies as well. ● Late 2020: KIP-500 work started in a branch. ● Early 2021: all code merged to trunk ○ New controller code ○ New self-managed Raft quorum implementation ● April 2021: Apache 2.8 released ○ First Apache release with KRaft ○ KRaft in Early Access ● September 2021: Apache 3.0 release ○ First release with KRaft snapshot support ○ Added reassignment and EOS support as well ○ Preview
  • 72. Testing Phase 72 ● This is where we are now. ● KRaft is in preview mode ○ Not recommended for production ○ Available for testing ● January 2022: Apache Kafka 3.1 released ○ Many bug fixes, new tests ○ Some new KRaft metrics ● May 2022: Apache Kafka 3.2 released (soon) ○ New KRaft-based authorizer
  • 73. Production Phase 73 ● Coming soon! ● During this phase, we will recommend KRaft for production use. ○ Also known as “going GA” ● There may still be some feature gaps ○ SCRAM ○ JBOD support ○ Delegation token support ● Most importantly, we will not have support for upgrading from ZK yet in this phase. ○ Therefore, in this phase, KRaft will be useful for new clusters, but not for existing ones
  • 74. Bridge Release 74 ● A bridge release is a release that supports upgrading from ZK mode to KRaft mode. ○ As described in the section on upgrading from ZK, this requires brokers to channel their admin changes through the controller, rather than mutating ZK directly. ● We will make multiple bridge releases ● We will work hard to close all the remaining feature gaps during this phase, so that there are no longer any users stuck on ZK.
  • 75. Last Phase: Removing ZK Support 75 ● Eventually, we will want to remove ZK support so that we no longer have to maintain two controller implementations ● Before we can remove ZK support, we have to deprecate it, and then make a new major release ● Timeline for full removal is TBD