Basics of Apache Kafka
Basics of Apache Kafka
of Contents
Introduction 1.1
Overview of Kafka 1.2
Kafka Services
AdminManager 2.1
Authorizer 2.2
Cluster 2.3
ConsumerCoordinator 2.4
ConsumerNetworkClient 2.5
DynamicConfigManager 2.6
Fetcher 2.7
GroupCoordinator 2.8
GroupMetadataManager 2.9
Kafka 2.10
KafkaApisAPI Request Handler 2.11
KafkaClient 2.12
NetworkClient 2.12.1
KafkaHealthcheck 2.13
KafkaServerStartableThin Management Layer over KafkaServer 2.14
KafkaServer 2.15
KafkaConfig 2.16
KafkaController 2.17
ControllerEventManager 2.17.1
ControllerEventThread 2.17.2
ControllerEvent 2.17.3
TopicDeletion Controller Event 2.17.3.1
KafkaMetricsReporter 2.18
KafkaRequestHandler 2.19
KafkaRequestHandlerPoolPool of Daemon KafkaRequestHandler Threads 2.20
1
KafkaScheduler 2.21
LogDirFailureHandler 2.22
LogManager 2.23
Metadata 2.24
MetadataCache 2.25
MetadataResponse 2.26
MetadataUpdater 2.27
DefaultMetadataUpdater 2.27.1
OffsetConfig 2.28
Partition 2.29
PartitionStateMachine 2.30
ReplicaManager 2.31
ReplicaStateMachine 2.32
ShutdownableThread 2.33
SocketServer 2.34
TopicDeletionManager 2.35
TransactionCoordinator 2.36
TransactionStateManager 2.37
ZkUtils 2.38
Features
Topic Replication 3.1
Topic Deletion 3.2
Kafka Controller Election 3.3
Architecture
BrokerKafka Server 4.1
Topics 4.2
Messages 4.3
Kafka Clients 4.4
Producers 4.4.1
KafkaProducer 4.4.1.1
2
Sender 4.4.1.2
Consumers 4.4.2
KafkaConsumerMain Class For Kafka Consumers 4.4.2.1
Deserializer 4.4.2.2
ConsumerConfig 4.4.2.3
Consumer 4.4.2.4
ConsumerInterceptor 4.4.2.5
Clusters 4.5
Metrics
Sensor 5.1
MetricsReporter 5.2
ProducerMetrics 5.3
SenderMetrics 5.4
Kafka Tools 6.1
kafka-configs.sh 6.1.1
kafka-consumer-groups.sh 6.1.2
kafka-topics.sh 6.1.3
Properties 6.2
bootstrap.servers 6.2.1
client.id 6.2.2
enable.auto.commit 6.2.3
group.id 6.2.4
retry.backoff.ms 6.2.5
Logging 6.3
WorkerGroupMember 7.1
ConnectDistributed 7.2
3
Appendix
Further reading or watching 9.1
4
Introduction
If you like the Apache Kafka notes you should seriously consider participating in my own,
very hands-on Spark Workshops.
This collections of notes (what some may rashly call a "book") serves as the ultimate place
of mine to collect all the nuts and bolts of using Apache Kafka. The notes aim to help me
designing and developing better products with Kafka. It is also a viable proof of my
understanding of Apache Kafka. I do eventually want to reach the highest level of mastery in
Apache Kafka.
Expect text and code snippets from a variety of public sources. Attribution follows.
5
Overview of Kafka
Overview of Kafka
Apache Kafka is an open source project for a distributed publish-subscribe messaging
system rethought as a distributed commit log.
Kafka stores messages in topics that are partitioned and replicated across multiple brokers
in a cluster. Producers send messages to topics from which consumers read.
Messages are byte arrays (with String, JSON, and Avro being the most common formats). If
a message has a key, Kafka makes sure that all messages of the same key are in the same
partition.
Consumers may be grouped in a consumer group with multiple consumers. Each consumer
in a consumer group will read messages from a unique subset of partitions in each topic
they subscribe to. Each message is delivered to one consumer in the group, and all
messages with the same key arrive at the same consumer.
DurabilityKafka does not track which messages were read by each consumer. Kafka
keeps all messages for a finite amount of time, and it is consumers' responsibility to track
their location per topic, i.e. offsets.
It is worth to note that Kafka is often compared to the following open source projects:
1. Apache ActiveMQ and RabbitMQ given they are message broker systems, too.
2. Apache Flume for its ingestion capabilities designed to send data to HDFS and Apache
HBase.
6
AdminManager
AdminManager
AdminManager isFIXME
Figure 1. AdminManager
logIdent is [Admin Manager on Broker [brokerId]].
createTopicPolicy
topicPurgatory
Refer to Logging.
createTopics(
timeout: Int,
validateOnly: Boolean,
createInfo: Map[String, CreateTopicsRequest.TopicDetails],
responseCallback: (Map[String, ApiError]) => Unit): Unit
7
AdminManager
createTopics FIXME
KafkaConfig
Metrics
MetadataCache
ZkUtils
8
Authorizer
Authorizer
Authorizer isFIXME
configure Method
Caution FIXME
9
Cluster
Cluster
Caution FIXME
topics Method
Caution FIXME
availablePartitionsForTopic Method
Caution FIXME
10
ConsumerCoordinator
ConsumerCoordinator
ConsumerCoordinator is a AbstractCoordinator thatFIXME
refreshCommittedOffsetsIfNeeded Method
FIXME
refreshCommittedOffsetsIfNeeded FIXME
maybeLeaveGroup Method
FIXME
maybeLeaveGroup FIXME
updatePatternSubscription Method
11
ConsumerCoordinator
FIXME
updatePatternSubscription FIXME
needRejoin Method
FIXME
needRejoin FIXME
timeToNextPoll Method
FIXME
timeToNextPoll FIXME
poll Method
FIXME
poll FIXME
commitOffsetsSync Method
FIXME
commitOffsetsSync FIXME
12
ConsumerCoordinator
commitOffsetsAsync Method
FIXME
commitOffsetsAsync FIXME
maybeAutoCommitOffsetsNow Method
FIXME
maybeAutoCommitOffsetsNow FIXME
addMetadataListener Method
FIXME
addMetadataListener FIXME
commitOffsetsSync Method
FIXME
commitOffsetsSync FIXME
fetchCommittedOffsets Method
FIXME
fetchCommittedOffsets FIXME
13
ConsumerCoordinator
LogContext
ConsumerNetworkClient
Group ID
rebalanceTimeoutMs
sessionTimeoutMs
heartbeatIntervalMs
Collection of PartitionAssignors
Metadata
SubscriptionState
Metrics
Time
retryBackoffMs
autoCommitEnabled flag
autoCommitIntervalMs
ConsumerInterceptors
excludeInternalTopics flag
leaveGroupOnClose flag
14
ConsumerNetworkClient
ConsumerNetworkClient
ConsumerNetworkClient is a high-level Kafka consumer thatFIXME
WorkerGroupMember ).
Refer to Logging.
wakeup Method
void wakeup()
wakeup turns the internal wakeup flag on and requests KafkaClient to wakeup
ensureFreshMetadata Method
Caution FIXME
send Method
15
ConsumerNetworkClient
Caution FIXME
pendingRequestCount Method
Caution FIXME
leastLoadedNode Method
Caution FIXME
poll Method
poll FIXME
awaitMetadataUpdate Method
Caution FIXME
awaitPendingRequests Method
Caution FIXME
16
ConsumerNetworkClient
pollNoWakeup Method
void pollNoWakeup()
pollNoWakeup FIXME
KafkaConsumer polls
LogContext
KafkaClient
Metadata
Time
retryBackoffMs
requestTimeoutMs
17
DynamicConfigManager
DynamicConfigManager
DynamicConfigManager isFIXME
startup Method
startup
startup FIXME
DynamicConfigManager
18
Fetcher
Fetcher
Fetcher is created exclusively when KafkaConsumer is created.
client
ConsumerNetworkClient that is given when Fetcher is
created.
ConsumerNetworkClient
Fetch size
Metadata
SubscriptionState
Metrics
FetcherMetricsRegistry
Time
19
Fetcher
IsolationLevel
FIXME
sendFetches Method
Caution FIXME
beginningOffsets Method
Caution FIXME
retrieveOffsetsByTimes Method
Caution FIXME
20
Fetcher
getAllTopicMetadata gets topic metadata specifying no topics (which means all topics
available).
21
GroupCoordinator
GroupCoordinator
GroupCoordinator isFIXME
Caution FIXME
Broker ID
GroupConfig
OffsetConfig
GroupMetadataManager
DelayedOperationPurgatory[DelayedHeartbeat]
DelayedOperationPurgatory[DelayedJoin]
Time
22
GroupCoordinator
startup first prints out the following INFO message to the logs:
In the end, startup prints out the following INFO message to the logs:
23
GroupMetadataManager
GroupMetadataManager
GroupMetadataManager isFIXME
groupMetadataTopicPartitionCount
scheduler KafkaScheduler
enableMetadataExpiration Method
enableMetadataExpiration(): Unit
24
GroupMetadataManager
Broker ID
ApiVersion
OffsetConfig
ReplicaManager
ZkUtils
Time
cleanupGroupMetadata takes the current time (using time) and for every GroupMetadata in
1. FIXME
In the end, cleanupGroupMetadata prints out the following INFO message to the logs:
getGroupMetadataTopicPartitionCount: Int
25
GroupMetadataManager
__consumer_offsets topic.
26
Kafka
KafkaStandalone Command-Line
Application
kafka.Kafka is a standalone command-line application that starts a Kafka broker.
getPropsFromArgs Method
Caution FIXME
In the end, main starts the KafkaServerStartable and waits till it finishes.
main terminates the JVM with status 0 when KafkaServerStartable shuts down properly
27
Kafka
registerLoggingSignalHandler(): Unit
registerLoggingSignalHandler registers signal handlers for TERM , INT and HUP signals
so that, once received, it prints out the following INFO message to the logs:
28
KafkaApisAPI Request Handler
Figure 1. KafkaApis
Table 1. KafkaApiss API Keys and Handlers (in alphabetical order)
Key Handler
CREATE_TOPICS handleCreateTopicsRequest
handle first prints out the following TRACE message to the logs:
handle then finds the handler that is responsible for the apiKey (in the header of the input
29
KafkaApisAPI Request Handler
handleCreateTopicsRequest FIXME
Caution FIXME
handleCreatePartitionsRequest Handler
handleCreatePartitionsRequest FIXME
handleDeleteTopicsRequest Handler
handleDeleteTopicsRequest FIXME
handleControlledShutdownRequest Handler
handleControlledShutdownRequest FIXME
30
KafkaApisAPI Request Handler
RequestChannel
ReplicaManager
AdminManager
GroupCoordinator
TransactionCoordinator
KafkaController
ZkUtils
Broker ID
KafkaConfig
MetadataCache
Metrics
Optional Authorizer
QuotaManagers
BrokerTopicStats
Cluster ID
Time
31
KafkaClient
KafkaClient
KafkaClient is the contract forFIXME
KafkaClient Contract
package org.apache.kafka.clients;
Used when:
wakeup
FIXME
Used when:
poll ConsumerNetworkClient polls
FIXME
32
KafkaClient
33
NetworkClient
NetworkClient
NetworkClient is a KafkaClient thatFIXME
wakeup Method
void wakeup()
poll Method
poll FIXME
handleCompletedReceives Method
handleCompletedReceives FIXME
MetadataUpdater
Metadata
34
NetworkClient
Selectable
Client ID
maxInFlightRequestsPerConnection
reconnectBackoffMs
reconnectBackoffMax
socketSendBuffer
socketReceiveBuffer
requestTimeoutMs
Time
discoverBrokerVersions flag
ApiVersions
Sensor
LogContext
35
KafkaHealthcheck
KafkaHealthcheck
KafkaHealthcheck registers the broker it runs on with Zookeeper (which in turn makes the
broker visible to other brokers that together can form a Kafka cluster).
Figure 1. KafkaHealthcheck
Table 1. KafkaHealthchecks Internal Properties (e.g. Registries and Counters)
Name Description
sessionExpireListener SessionExpireListener
Broker ID
Advertised endpoints
ZkUtils
ApiVersion
startup
36
KafkaHealthcheck
register(): Unit
For every EndPoint with no host assigned (in advertisedEndpoints), register assigns the
fully-qualified domain name of the local host.
register then finds the first EndPoint with PLAINTEXT security protocol or creates an
empty EndPoint .
Tip Define EndPoint with PLAINTEXT security protocol for older clients to connect.
In the end, register requests ZkUtils to registerBrokerInZk for brokerId, the host and port
of the PLAINTEXT endpoint, the updated endpoints, the JMX port, the optional rack and
protocol version.
Note register makes a broker visible for other brokers to form a Kafka cluster.
handleNewSession Method
Caution FIXME
37
KafkaServerStartableThin Management Layer over KafkaServer
KafkaServerStartableThin Management
Layer over KafkaServer
KafkaServerStartable is a thin management layer to manage a single KafkaServer instance.
awaitShutdown Method
Caution FIXME
shutdown Method
Caution FIXME
1. KafkaConfig
2. Collection of KafkaMetricsReporters
38
KafkaServerStartableThin Management Layer over KafkaServer
Caution FIXME
startup Method
startup(): Unit
In case of any exceptions, startup exits the JVM with status 1 . You should see the
following FATAL message in the logs if that happens.
Note startup is used when a Kafka Broker starts (on command line).
39
KafkaServer
KafkaServer
KafkaServer is a Kafka broker.
apis KafkaApis
brokerState BrokerState
_brokerTopicStats BrokerTopicStats
_clusterId Cluster ID
credentialProvider CredentialProvider
40
KafkaServer
dynamicConfigHandlers
dynamicConfigManager DynamicConfigManager
groupCoordinator GroupCoordinator
kafkaController KafkaController
kafkaHealthcheck KafkaHealthcheck
logContext LogContext
logDirFailureChannel LogDirFailureChannel
logManager LogManager
metadataCache MetadataCache
replicaManager ReplicaManager
Collection of MetricsReporter
reporters
Used whenFIXME
requestHandlerPool KafkaRequestHandlerPool
socketServer SocketServer
transactionCoordinator TransactionCoordinator
quotaManagers QuotaManagers
zkUtils ZkUtils
41
KafkaServer
Caution FIXME
Caution FIXME
Caution FIXME
Caution FIXME
notifyClusterListeners Method
Caution FIXME
KafkaConfig
Caution FIXME
startup(): Unit
Internally, startup first prints out the following INFO message to the logs:
42
KafkaServer
startup notifies cluster resource listeners (i.e. KafkaMetricsReporters and the configured
startup creates the SocketServer (for KafkaConfig, Metrics and CredentialProvider) and
43
KafkaServer
startup creates the KafkaController (for KafkaConfig, ZkUtils, Metrics and the optional
ZkUtils).
startup creates the GroupCoordinator (for KafkaConfig, ZkUtils and ReplicaManager) and
configures it.
and num.io.threads).
startup creates the KafkaHealthcheck (for broker ID, the advertised listeners, ZkUtils,
info,id=[brokerId] .
In the end, you should see the following INFO message in the logs:
44
KafkaServer
The INFO message above uses so-called log ident with the value of
Note broker.id property and is always in the format [Kafka Server [brokerId]],
after a Kafka server has fully started.
45
KafkaConfig
KafkaConfig
KafkaConfig is the configuration of a Kafka server and the services.
hostName host.name
numNetworkThreads num.network.threads
port port
replicaLagTimeMaxMs
Caution FIXME
getConfiguredInstances Method
Caution FIXME
getListeners: Seq[EndPoint]
getListeners creates the EndPoints if defined using listeners Kafka property or defaults to
46
KafkaController
KafkaController
KafkaController is a Kafka service responsible for:
topic deletion
FIXME
In a Kafka cluster, one of the brokers serves as the controller, which is responsible for
managing the states of partitions and replicas and for performing administrative tasks
like reassigning partitions.
Figure 1. KafkaController
KafkaController is part of every Kafka broker, but only one KafkaController is active at all
times.
47
KafkaController
ControlledShutdown
ID
ControlledShutdown
controlledShutdownCallback :
Try[Set[TopicAndPartition]]
Unit
Reelect ControllerChange
2. (only when the broker is no
longer an active controller
Resigns as the active controller
3. elect
1. registerSessionExpirationListener
Startup ControllerChange 2. registerControllerChangeListener
3. elect
48
KafkaController
controllerContext
ControllerEventManager for
eventManager controllerContext.stats.rateAndTimeMetrics and
updateMetrics listener
isrChangeNotificationListener IsrChangeNotificationListener
kafkaScheduler
KafkaScheduler with a single daemon thread with
prefix kafka-scheduler
partitionStateMachine PartitionStateMachine
replicaStateMachine ReplicaStateMachine
topicDeletionManager TopicDeletionManager
49
KafkaController
logDirEventNotificationListener LogDirEventNotificationListener
De-registered in deregisterTopicDeletionListener
when KafkaController resigns as the active
controller.
Refer to Logging.
initiateReassignReplicasForTopicPartition
Method
initiateReassignReplicasForTopicPartition
initiateReassignReplicasForTopicPartition FIXME
deregisterPartitionReassignmentIsrChangeListen
50
KafkaController
deregisterPartitionReassignmentIsrChangeListen
ers Method
deregisterPartitionReassignmentIsrChangeListeners
deregisterPartitionReassignmentIsrChangeListeners FIXME
resetControllerContext Method
resetControllerContext
resetControllerContext FIXME
deregisterBrokerChangeListener Method
deregisterBrokerChangeListener
deregisterBrokerChangeListener FIXME
deregisterTopicChangeListener Method
deregisterTopicChangeListener
deregisterTopicChangeListener FIXME
onControllerResignation(): Unit
51
KafkaController
onControllerResignation starts by printing out the following DEBUG message to the logs:
Resigning
znodes in order:
offlinePartitionCount
preferredReplicaImbalanceCount
globalTopicCount
globalPartitionCount
onControllerResignation deregisterPartitionReassignmentIsrChangeListeners.
onControllerResignation deregisterTopicChangeListener.
partitionModificationsListeners.
onControllerResignation deregisterTopicDeletionListener.
onControllerResignation deregisterBrokerChangeListener.
onControllerResignation resetControllerContext.
In the end, onControllerResignation prints out the following DEBUG message to the logs:
Resigned
52
KafkaController
deregisterIsrChangeNotificationListener(): Unit
logs:
De-registering IsrChangeNotificationListener
deregisterLogDirEventNotificationListener(): Unit
logs:
De-registering logDirEventNotificationListener
53
KafkaController
deregisterPreferredReplicaElectionListener(): Unit
deregisterPartitionReassignmentListener(): Unit
triggerControllerMove(): Unit
triggerControllerMove FIXME
54
KafkaController
1. KafkaController handleIllegalState
Note
2. KafkaController caught an exception while electing or becoming a
controller
handleIllegalState FIXME
sendUpdateMetadataRequest Method
sendUpdateMetadataRequest(): Unit
sendUpdateMetadataRequest FIXME
updateLeaderEpochAndSendRequest(): Unit
updateLeaderEpochAndSendRequest FIXME
shutdown Method
shutdown(): Unit
shutdown FIXME
55
KafkaController
Caution FIXME
onBrokerStartup Method
onBrokerStartup FIXME
elect Method
elect(): Unit
elect FIXME
Note elect is used when KafkaController enters Startup and Reelect states.
onControllerFailover Method
Caution FIXME
isActive Method
isActive: Boolean
isActive says whether the activeControllerId equals the broker ID (from KafkaConfig).
56
KafkaController
KafkaConfig
ZkUtils
Time
Metrics
startup(): Unit
startup puts Startup event at the end of the event queue of ControllerEventManager and
requests it to start.
registerSessionExpirationListener(): Unit
57
KafkaController
registerControllerChangeListener(): Unit
ControllerEventManager).
ControllerChangeListener emits:
1. ControllerChange event with the current controller ID (on the event queue
Note of ControllerEventManager ) every time the data of a znode changes
2. Reelect event when the data associated with a znode has been deleted
registerBrokerChangeListener(): Option[Seq[String]]
getControllerID(): Int
getControllerID returns the ID of the active Kafka controller that is associated with
Internally, getControllerID requests ZkUtils for data associated with /controller znode.
If available, getControllerID parses the data (being the current controller info in JSON
format) to extract brokerid field.
58
KafkaController
$ ./bin/zookeeper-shell.sh 0.0.0.0:2181
Connecting to 0.0.0.0:2181
Welcome to ZooKeeper!
...
get /controller
{"version":1,"brokerid":100,"timestamp":"1506197069724"}
cZxid = 0xf9
ctime = Sat Sep 23 22:04:29 CEST 2017
mZxid = 0xf9
mtime = Sat Sep 23 22:04:29 CEST 2017
pZxid = 0xf9
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x15eaa3a4fdd000d
dataLength = 56
numChildren = 0
registerTopicDeletionListener(): Option[Seq[String]]
59
KafkaController
deregisterTopicDeletionListener(): Unit
60
ControllerEventManager
ControllerEventManager
ControllerEventManager isFIXME
thread
ControllerEventThread with controller-event-thread
thread name
61
ControllerEventManager
start(): Unit
62
ControllerEventThread
ControllerEventThread
ControllerEventThread is a ShutdownableThread that is started when
ControllerEventManager is started
doWork(): Unit
63
ControllerEventThread
doWork takes and removes the head of event queue (waiting if necessary until an element
becomes available).
The very first event in the event queue is Startup that KafkaController puts
Note
when it is started.
doWork finds the KafkaTimer for the state in rateAndTimeMetrics lookup table (of
ControllerEventManager ).
64
ControllerEvent
ControllerEvent
ControllerEvent is the contract of events in the lifecycle of KafkaController state machine
that, once emitted, triggers state change and the corresponding process action.
package kafka.controller
ControllerEvent is a Scala sealed trait and so all the available events are in a
Note
single compilation unit (i.e. a file).
process
Used when ControllerEventThread does the work to
trigger an action associated with state change.
65
TopicDeletion Controller Event
state is TopicDeletion .
process Method
process(): Unit
Note process is executed on the active controller only (and does nothing otherwise).
process requests ControllerContext for allTopics and finds topics that are supposed to be
If there are any non-existent topics, process prints out the following WARN message to the
logs and requests ZkUtils to deletePathRecursive /admin/delete_topics/[topicName] znode
for every topic in the list.
With delete.topic.enable enabled (i.e. true ), process prints out the following INFO
message to the logs:
66
TopicDeletion Controller Event
With delete.topic.enable disabled (i.e. false ), process prints out the following INFO
message to the logs (for every topic):
topic).
67
KafkaMetricsReporter
KafkaMetricsReporter
Caution FIXME
68
KafkaRequestHandler
KafkaRequestHandler
KafkaRequestHandler is a thread of execution (i.e. Javas Runnable) that handles Kafka
KafkaRequestHandler is createdFIXME
run(): Unit
Caution FIXME
ID
Broker ID
RequestChannel
KafkaApis
Time
69
KafkaRequestHandlerPoolPool of Daemon KafkaRequestHandler Threads
KafkaRequestHandlerPoolPool of Daemon
KafkaRequestHandler Threads
KafkaRequestHandlerPool is a pool of daemon kafka-request-handler threads that are
shutdown Method
Caution FIXME
Broker ID
RequestChannel
KafkaApis
70
KafkaRequestHandlerPoolPool of Daemon KafkaRequestHandler Threads
Time
71
KafkaScheduler
KafkaScheduler
KafkaScheduler is a Scheduler to schedule tasks in Kafka.
Refer to Logging.
startup(): Unit
When startup is executed, you should see the following DEBUG message in the logs:
startup initializes executor with threads threads. The name of the threads is in format of
72
KafkaScheduler
Caution FIXME
shutdown Method
Caution FIXME
Caution FIXME
schedule(name: String, fun: () => Unit, delay: Long, period: Long, unit: TimeUnit): Un
it
When schedule is executed, you should see the following DEBUG message in the logs:
DEBUG Scheduling task [name] with initial delay [delay] ms and period [period] ms. (ka
fka.utils.KafkaScheduler)
schedule first makes sure that KafkaScheduler is running (which simply means that the
For positive period , schedule schedules the thread every period after the initial delay .
Otherwise, schedule schedules the thread once.
Whenever the thread is executed, and before fun gets triggerred, you should see the
following TRACE message in the logs:
73
KafkaScheduler
After the execution thread is finished, you should see the following TRACE message in the
logs:
In case of any exceptions, the execution thread catches them and you should see the
following ERROR message in the logs:
Scheduler Contract
trait Scheduler {
def startup(): Unit
def shutdown(): Unit
def isStarted: Boolean
def schedule(name: String, fun: () => Unit, delay: Long = 0, period: Long = -1, unit
: TimeUnit = TimeUnit.MILLISECONDS)
}
74
LogDirFailureHandler
LogDirFailureHandler
LogDirFailureHandler isFIXME
start Method
Caution FIXME
75
LogManager
LogManager
Caution FIXME
startup Method
Caution FIXME
76
Metadata
Metadata
Metadata FIXME
Cluster
cluster Empty when Metadata is created
Updated whenFIXME
FlagFIXME
Disabled (i.e. false ) when Metadata is created
needMetadataForAllTopics
Updated when Metadata is requested to set state to
indicate that metadata for all topics in Kafka cluster is
required
Refer to Logging.
timeToNextUpdate Method
timeToNextUpdate FIXME
77
Metadata
ConsumerNetworkClient ensureFreshMetadata
Note
DefaultMetadataUpdater (of NetworkClient ) isUpdateDue and
maybeUpdate
add Method
add FIXME
requestUpdate Method
requestUpdate FIXME
Caution FIXME
78
Metadata
update FIXME
KafkaConsumer is created
Note
KafkaProducer is created
refreshBackoffMs
metadataExpireMs
allowAutoTopicCreation flag
topicExpiryEnabled flag
ClusterResourceListeners
needMetadataForAllTopics .
79
Metadata
add
Note
needMetadataForAllTopics
setTopics
80
MetadataCache
MetadataCache
MetadataCache isFIXME
81
MetadataResponse
MetadataResponse
cluster Method
Caution FIXME
82
MetadataUpdater
MetadataUpdater
MetadataUpdater Contract
package org.apache.kafka.clients;
interface MetadataUpdater {
List<Node> fetchNodes();
void handleDisconnection(String destination);
void handleCompletedMetadataResponse(RequestHeader requestHeader, long now, Metadata
Response metadataResponse);
boolean isUpdateDue(long now);
long maybeUpdate(long now);
void requestUpdate();
}
handleCompletedMetadataResponse
Used exclusively when NetworkClient handles
completed receives
83
DefaultMetadataUpdater
DefaultMetadataUpdater
DefaultMetadataUpdater is a MetadataUpdater thatFIXME
Caution FIXME
isUpdateDue Method
Caution FIXME
maybeUpdate Method
Caution FIXME
handleCompletedMetadataResponse FIXME
84
OffsetConfig
OffsetConfig
OffsetConfig isFIXME
85
Partition
Partition
A Kafka topic is spread across a Kafka cluster as a virtual group of one or more partitions.
A topic partition can be replicated across a Kafka cluster to one or more Kafka servers.
A topic partition has one partition leader and zero or more topic replicas.
Kafka producers publish messages to topic leaders as do Kafka consumers consume them
from.
Partition isFIXME
makeLeader Method
makeLeader(
controllerId: Int,
partitionStateInfo: LeaderAndIsrRequest.PartitionState,
correlationId: Int): Boolean
makeLeader FIXME
makeFollower Method
makeFollower(
controllerId: Int,
partitionStateInfo: LeaderAndIsrRequest.PartitionState,
correlationId: Int): Boolean
makeFollower FIXME
leaderReplicaIfLocal Method
86
Partition
leaderReplicaIfLocal: Option[Replica]
leaderReplicaIfLocal givesFIXME
maybeShrinkIsr Method
Caution FIXME
Topic name
Partition ID
Time
ReplicaManager
87
PartitionStateMachine
PartitionStateMachine
PartitionStateMachine isFIXME
shutdown Method
FIXME
shutdown FIXME
88
ReplicaManager
ReplicaManager
ReplicaManager is created and started when KafkaServer is requested to start up.
ReplicaManager is a KafkaMetricsGroup .
logDirFailureHandler LogDirFailureHandler
OfflinePartition
getLeaderPartitions: List[Partition]
getLeaderPartitions gives the partitions from allPartitions that are not offline and their
maybePropagateIsrChanges Method
Caution FIXME
isr-expiration Task
Caution FIXME
isr-change-propagation Task
Caution FIXME
89
ReplicaManager
KafkaConfig
Metrics
Time
ZkUtils
Scheduler
LogManager
isShuttingDown flag
ReplicationQuotaManager
BrokerTopicStats
MetadataCache
LogDirFailureChannel
DelayedOperationPurgatory[DelayedProduce]
DelayedOperationPurgatory[DelayedFetch]
DelayedOperationPurgatory[DelayedDeleteRecords]
startup(): Unit
1. isr-expiration
2. isr-change-propagation
90
ReplicaManager
maybeShrinkIsr(): Unit
TRACE Evaluating ISR list of partitions to see which replicas can be removed from the
ISR
maybeShrinkIsr requests the partitions (from allPartitions pool that are not offline partitions)
91
ReplicaStateMachine
ReplicaStateMachine
ReplicaStateMachine isFIXME
shutdown Method
FIXME
shutdown FIXME
92
ShutdownableThread
ShutdownableThread
ShutdownableThread is the contract for non-daemon threads of execution.
shutdownLatch
Javas java.util.concurrentCountDownLatch with the
number of passes being 1
run Method
run(): Unit
Note run is a part of java.lang.Runnable that is executed when the thread is started.
run first prints out the following INFO message to the logs:
Starting
In the end, run decrements the count of shutdownLatch and prints out the following INFO
message to the logs:
Stopped
93
SocketServer
SocketServer
SocketServer is a NIO socket server.
MemoryPoolAvailable
MemoryPoolUsed
94
SocketServer
Table 2. SocketServers Internal Properties (e.g. Registries and Counters) (in alphabetical
order)
Name Description
acceptors Acceptor threads per EndPoint
connectionQuotas ConnectionQuotas
maxQueuedRequests
maxConnectionsPerIp
maxConnectionsPerIpOverrides
memoryPool
requestChannel
totalProcessorThreads
Total number of processors, i.e.
numProcessorThreads for every endpoint
Caution FIXME
startup(): Unit
For every endpoint (in endpoints registry) startup does the following:
95
SocketServer
4. Starts a non-daemon thread for the Acceptor with the name as kafka-socket-acceptor-
[listenerName]-[securityProtocol]-[port] (e.g. kafka-socket-acceptor-
In the end, startup prints out the following INFO message to the logs:
KafkaConfig
Metrics
Time
CredentialProvider
96
TopicDeletionManager
TopicDeletionManager
TopicDeletionManager isFIXME
topicsIneligibleForDeletion
The names of the topics that must not be deleted (i.e.
are ineligible for deletion)
Refer to Logging.
97
TopicDeletionManager
Caution FIXME
enqueueTopicsForDeletion Method
Caution FIXME
failReplicaDeletion Method
Caution FIXME
KafkaController
ControllerEventManager
markTopicIneligibleForDeletion Method
If there are any topics in the intersection, markTopicIneligibleForDeletion prints out the
following INFO message to the logs:
98
TopicDeletionManager
KafkaController initiateReassignReplicasForTopicPartition
reset(): Unit
(only with delete.topic.enable Kafka property enabled) reset removes all elements from the
following internal registries:
topicsToBeDeleted
partitionsToBeDeleted
topicsIneligibleForDeletion
99
TransactionCoordinator
TransactionCoordinator
TransactionCoordinator isFIXME
startup Method
startup
startup FIXME
TransactionCoordinator
100
TransactionStateManager
TransactionStateManager
TransactionStateManager isFIXME
getTransactionTopicPartitionCount Method
getTransactionTopicPartitionCount
getTransactionTopicPartitionCount FIXME
101
ZkUtils
ZkUtils
ZkUtils isFIXME
Table 2. ZkUtilss Internal Properties (e.g. Registries and Counters) (in alphabetical order)
Name Description
persistentZkPaths
zkPath
deletePathRecursive Method
Caution FIXME
deletePath Method
102
ZkUtils
Caution FIXME
apply(
zkUrl: String,
sessionTimeout: Int,
connectionTimeout: Int,
isZkSecurityEnabled: Boolean): ZkUtils
apply FIXME
2. FIXME
2. FIXME
subscribeChildChanges FIXME
103
ZkUtils
and childListener .
and dataListener .
registerBrokerInZk Method
registerBrokerInZk(
id: Int,
host: String,
port: Int,
advertisedEndpoints: Seq[EndPoint],
jmxPort: Int,
rack: Option[String],
apiVersion: ApiVersion): Unit
registerBrokerInZk FIXME
getTopicPartitionCount Method
104
ZkUtils
getTopicPartitionCount FIXME
"version":1
"brokerid":[brokerId]
"timestamp":[timestamp]
import kafka.utils._
scala> ZkUtils.controllerZkData(1, System.currentTimeMillis())
res0: String = {"version":1,"brokerid":1,"timestamp":"1506161225262"}
ZkClient
ZkConnection
isSecure flag
105
ZkUtils
readDataMaybeNull returns None (for Option[String] ) when path znode is not available.
106
Topic Replication
Topic Replication
Topic Replication is the process to offer fail-over capability for a topic.
./bin/kafka-topics.sh --create \
--topic my-topic \
--replication-factor 1 \ // <-- define replication factor
--partitions 1 \
--zookeeper localhost:2181
Producers always send requests to the broker that is the current leader replica for a topic
partition.
Data from producers is first saved to a commit log before consumers can find out that it is
available. It will only be visible to consumers when the followers acknowledge that they have
got the data and stored in their local logs.
107
Topic Deletion
Topic Deletion
Topic Deletion is a feature of Kafka that allows for deleting topics.
$ ./bin/kafka-server-start.sh config/server.properties \
--override delete.topic.enable=true \
--override broker.id=100 \
--override log.dirs=/tmp/kafka-logs-100 \
--override port=9192
Note that the broker 100 is the leader for remove-me topic.
Stop the broker 100 and start another with broker ID 200 .
108
Topic Deletion
$ ./bin/kafka-server-start.sh config/server.properties \
--override delete.topic.enable=true \
--override broker.id=200 \
--override log.dirs=/tmp/kafka-logs-200 \
--override port=9292
As you may have noticed, kafka-topics.sh --delete will only delete a topic if the topics
leader broker is available (and can acknowledge the removal). Since the broker 100 is down
and currently unavailable the topic deletion has only been recorded in Zookeeper.
As long as the leader broker 100 is not available, the topic to be deleted remains marked
for deletion.
$ ./bin/kafka-server-start.sh config/server.properties \
--override delete.topic.enable=true \
--override broker.id=100 \
--override log.dirs=/tmp/kafka-logs-100 \
--override port=9192
With kafka.controller.KafkaController logger at DEBUG level, you should see the following
messages in the logs:
109
Topic Deletion
DEBUG [Controller id=100] Delete topics listener fired for topics remove-me to be dele
ted (kafka.controller.KafkaController)
INFO [Controller id=100] Starting topic deletion for topics remove-me (kafka.controlle
r.KafkaController)
INFO [GroupMetadataManager brokerId=100] Removed 0 expired offsets in 0 milliseconds.
(kafka.coordinator.group.GroupMetadataManager)
DEBUG [Controller id=100] Removing replica 100 from ISR 100 for partition remove-me-0.
(kafka.controller.KafkaController)
INFO [Controller id=100] Retaining last ISR 100 of partition remove-me-0 since unclean
leader election is disabled (kafka.controller.KafkaController)
INFO [Controller id=100] New leader and ISR for partition remove-me-0 is {"leader":-1,
"leader_epoch":1,"isr":[100]} (kafka.controller.KafkaController)
INFO [ReplicaFetcherManager on broker 100] Removed fetcher for partitions remove-me-0
(kafka.server.ReplicaFetcherManager)
INFO [ReplicaFetcherManager on broker 100] Removed fetcher for partitions (kafka.serv
er.ReplicaFetcherManager)
INFO [ReplicaFetcherManager on broker 100] Removed fetcher for partitions remove-me-0
(kafka.server.ReplicaFetcherManager)
INFO Log for partition remove-me-0 is renamed to /tmp/kafka-logs-100/remove-me-0.fe6d0
39ff884498b9d6113fb22a75264-delete and is scheduled for deletion (kafka.log.LogManager
)
DEBUG [Controller id=100] Delete topic callback invoked for org.apache.kafka.common.re
quests.StopReplicaResponse@8c0f4f0 (kafka.controller.KafkaController)
INFO [Controller id=100] New topics: [Set()], deleted topics: [Set()], new partition r
eplica assignment [Map()] (kafka.controller.KafkaController)
DEBUG [Controller id=100] Delete topics listener fired for topics to be deleted (kafk
a.controller.KafkaController)
The topic is now deleted. Use Zookeeper CLI tool to confirm it.
110
Kafka Controller Election
$ ./bin/zookeeper-server-start.sh config/zookeeper.properties
...
INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFa
ctory)
Add the following line to config/log4j.properties to enable DEBUG logging level for
kafka.controller.KafkaController logger.
log4j.logger.kafka.controller.KafkaController=DEBUG, stdout
$ ./bin/kafka-server-start.sh config/server.properties \
--override broker.id=100 \
--override log.dirs=/tmp/kafka-logs-100 \
--override port=9192
...
INFO Registered broker 100 at path /brokers/ids/100 with addresses: EndPoint(192.168.1
.4,9192,ListenerName(PLAINTEXT),PLAINTEXT) (kafka.utils.ZkUtils)
INFO Kafka version : 1.0.0-SNAPSHOT (org.apache.kafka.common.utils.AppInfoParser)
INFO Kafka commitId : 852297efd99af04d (org.apache.kafka.common.utils.AppInfoParser)
INFO [KafkaServer id=100] started (kafka.server.KafkaServer)
111
Kafka Controller Election
Connect to Zookeeper using Zookeeper CLI (command-line interface). Use the official
distribution of Apache Zookeeper as described in Zookeeper Tips.
Once connected, execute get /controller to get the data associated with /controller
znode where the active Kafka controller stores the controller ID.
(optional) Clear the consoles of the two Kafka brokers so you have the election logs only.
You should see the following in the logs in the consoles of the two Kafka brokers.
and
112
Kafka Controller Election
113
BrokerKafka Server
BrokerKafka Server
Given the scaladoc of KafkaServer a Kafka server, a Kafka broker and a
Note
Kafka node all refer to the same concept and are hence considered synonyms.
A brokers prime responsibility is to bring sellers and buyers together and thus a broker
is the third-person facilitator between a buyer and a seller.
A Kafka broker receives messages from producers and stores them on disk keyed by unique
offset.
A Kafka broker allows consumers to fetch messages by topic, partition and offset.
Kafka brokers can create a Kafka cluster by sharing information between each other directly
or indirectly using Zookeeper.
./bin/zookeeper-server-start.sh config/zookeeper.properties
Only when Zookeeper is up and running you can start a Kafka server (that will connect to
Zookeeper).
./bin/kafka-server-start.sh config/server.properties
kafka-server-start.sh script
kafka-server-start.sh starts a Kafka broker.
114
BrokerKafka Server
$ ./bin/kafka-server-start.sh
USAGE: ./bin/kafka-server-start.sh [-daemon] server.properties [--override property=va
lue]*
KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:config/log4j.properties"
Command-line options:
4. --override property=value value that should override the value set for property in
server.properties file.
115
Topics
Topics
Topics are virtual groups of one or many partitions across Kafka brokers in a Kafka cluster.
A single Kafka broker stores messages in a partition in an ordered fashion, i.e. appends
them one message after another and creates a log file.
Producers write messages to the tail of these logs that consumers read at their own pace.
Kafka scales topic consumption by distributing partitions among a consumer group, which is
a set of consumers sharing a common group identifier.
Partitions
Partitions with messagestopics can be partitioned to improve read/write performance
and resiliency. You can lay out a topic (as partitions) across a cluster of machines to allow
data streams larger than the capability of a single machine. Partitions are log files on disk
with sequential write only. Kafka guarantees message ordering in a partition.
The log end offset is the offset of the last message written to a log.
The high watermark offset is the offset of the last message that was successfully copied to
all of the logs replicas.
A consumer can only read up to the high watermark offset to prevent reading
Note
unreplicated messages.
116
Messages
Messages
Messages are the data that brokers store in the partitions of a topic.
Messages are sequentially appended to the end of the partition log file and numbered by
unique offsets. They are persisted on disk (aka disk-based persistence) and replicated within
the cluster to prevent data loss. It has an in-memory page cache to improve data reads.
Messages are in partitions until deleted when TTL occurs or after compaction.
Offsets
Offsets are message positions in a topic.
117
Kafka Clients
Kafka Clients
Producers
Consumers
118
Producers
Producers
Multiple concurrent producers that send (aka push) messages to topics which is appending
the messages to the end of partitions. They can batch messages before they are sent over
the wire to a topic. Producers support message compression. Producers can send
messages in synchronous (with acknowledgement) or asynchronous mode.
import collection.JavaConversions._
import org.apache.kafka.common.serialization._
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerRecord
scala> f.get
res7: org.apache.kafka.clients.producer.RecordMetadata = my-topic-0@1
producer.close
119
KafkaProducer
KafkaProducer
KafkaProducer is one of the main parts of the public API of Apache Kafka to write Kafka
FIXME
120
Sender
Sender
Sender is a thread of execution that handles the sending of produce requests to a Kafka
cluster.
sendProducerData FIXME
run FIXME
Note run is used exclusively when Sender is started (as a thread of execution).
void run()
Note run is a part of java.lang.Runnable that is executed when the thread is started.
run first prints out the following DEBUG message to the logs:
121
Sender
run keeps running (with the current time in milliseconds) until running flag is turned off.
run FIXME
LogContext
KafkaClient
Metadata
RecordAccumulator
guaranteeMessageOrder flag
maxRequestSize
acks
SenderMetricsRegistry
Time
requestTimeout
retryBackoffMs
TransactionManager
ApiVersions
122
Consumers
Kafka Consumers
Multiple concurrent consumers read (aka pull) messages from topics however they want
using offsets. Unlike typical messaging systems, Kafka consumers pull messages from a
topic using offsets.
Kafka 0.9.0.0 was about introducing a brand new Consumer API aka New
Note
Consumer.
When a consumer is created, it requires bootstrap.servers which is the initial list of brokers
to discover the full set of alive brokers in a cluster from.
A consumer has to subscribe to the topics it wants to read messages from called topic
subscription.
Consumer Contract
123
Consumers
Topic Subscription
Topic Subscription is the process of announcing the topics a consumer wants to read
messages from.
subscribe method is not incremental and you always must include the full list
Note
of topics that you want to consume from.
You can change the set of topics a consumer is subscrib to at any time and (given the note
above) any topics previously subscribed to will be replaced by the new list after subscribe .
Caution FIXME
Consumer Groups
A consumer group is a set of Kafka consumers that share a common link:a set of
consumers sharing a common group identifier#group.id[group identifier].
Caution FIXME
the new consumer uses a group coordination protocol built into Kafka
For each group, one of the brokers is selected as the group coordinator. The
coordinator is responsible for managing the state of the group. Its main job is to mediate
partition assignment when new members arrive, old members depart, and when topic
metadata changes. The act of reassigning partitions is known as rebalancing the group.
When a group is first initialized, the consumers typically begin reading from either the
earliest or latest offset in each partition. The messages in each partition log are then
read sequentially. As the consumer makes progress, it commits the offsets of messages
it has successfully processed.
124
Consumers
When a partition gets reassigned to another consumer in the group, the initial position is
set to the last committed offset. If a consumer suddenly crashed, then the group
member taking over the partition would begin consumption from the last committed
offset (possibly reprocessing messages that the failed consumer would have processed
already but not committed yet).
125
KafkaConsumerMain Class For Kafka Consumers
KafkaConsumer is a part of the public API and is created with properties and (key and value)
deserializers as configuration.
// sandbox/kafka-sandbox
val bootstrapServers = "localhost:9092"
val groupId = "kafka-sandbox"
import org.apache.kafka.clients.consumer.ConsumerConfig
val configs: Map[String, Object] = Map(
// required properties
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> bootstrapServers,
ConsumerConfig.GROUP_ID_CONFIG -> groupId
)
import org.apache.kafka.common.serialization.StringDeserializer
val keyDeserializer = new StringDeserializer
val valueDeserializer = new StringDeserializer
import scala.collection.JavaConverters._
import org.apache.kafka.clients.consumer.KafkaConsumer
val consumer = new KafkaConsumer[String, String](
configs.asJava,
keyDeserializer,
valueDeserializer)
126
KafkaConsumerMain Class For Kafka Consumers
Figure 1. KafkaConsumer
KafkaConsumer does not support multi-threaded access. You should use
only one thread per KafkaConsumer instance.
KafkaConsumer uses light locks protecting itself from multi-threaded access
Important and reports ConcurrentModificationException when it happens.
127
KafkaConsumerMain Class For Kafka Consumers
ConsumerNetworkClient
Used mainly (?) to create the Fetcher and
client
ConsumerCoordinator
Used also in poll, pollOnce and wakeup (but I think the
usage should be limited to create Fetcher and
ConsumerCoordinator)
clientId
coordinator ConsumerCoordinator
Fetcher
fetcher Created right when KafkaConsumer is created.
Used whenFIXME
Metadata
metadata Created right when KafkaConsumer is created.
Used whenFIXME
metrics Metrics
128
KafkaConsumerMain Class For Kafka Consumers
Refer to Logging.
unsubscribe Method
Caution FIXME
subscribe FIXME
import scala.collection.JavaConverters._
consumer.subscribe(topics.asJava)
129
KafkaConsumerMain Class For Kafka Consumers
Internally, subscribe prints out the following DEBUG message to the logs:
subscribe then requests SubscriptionState to subscribe for the topics and listener .
In the end, subscribe requests SubscriptionState for groupSubscription that it then passes
along to Metadata to set the topics to track.
val seconds = 10
while (true) {
println(s"Polling for records for $seconds secs")
val records = consumer.poll(seconds * 1000)
// do something with the records here
}
130
KafkaConsumerMain Class For Kafka Consumers
If there are records available, poll checks Fetcher for sendFetches and
ConsumerNetworkClient for pendingRequestCount flag. If either is positive, poll requests
ConsumerNetworkClient to pollNoWakeup.
FIXME Make the above more user-friendly, e.g. when could interceptors
Caution
be empty?
Caution FIXME
endOffsets Method
Caution FIXME
offsetsForTimes Method
Caution FIXME
updateFetchPositions Method
Caution FIXME
131
KafkaConsumerMain Class For Kafka Consumers
Caution FIXME
Internally, listTopics simply requests Fetcher for metadata for all topics and returns it.
beginningOffsets Method
KafkaConsumer API offers other constructors that in the end use the public 3-
Note argument constructor that in turn passes the call on to the private internal
constructor.
132
KafkaConsumerMain Class For Kafka Consumers
// Public API
KafkaConsumer(
Map<String, Object> configs,
Deserializer<K> keyDeserializer,
Deserializer<V> valueDeserializer)
When created, KafkaConsumer adds the keyDeserializer and valueDeserializer to configs (as
key.deserializer and value.deserializer properties respectively) and creates a
ConsumerConfig.
KafkaConsumer(
ConsumerConfig config,
Deserializer<K> keyDeserializer,
Deserializer<V> valueDeserializer)
When called, the internal KafkaConsumer constructor prints out the following DEBUG
message to the logs:
KafkaConsumer sets the internal clientId to client.id or generates one with prefix consumer-
KafkaConsumer sets the internal Metrics (and JmxReporter with kafka.consumer prefix).
1. retryBackoffMs
2. metadata.max.age.ms
3. allowAutoTopicCreation enabled
4. topicExpiryEnabled disabled
133
KafkaConsumerMain Class For Kafka Consumers
interceptor.classes property
fetch.min.bytes
fetch.max.bytes
fetch.max.wait.ms
max.partition.fetch.bytes
max.poll.records
check.crcs
In the end, KafkaConsumer prints out the following DEBUG message to the logs:
wakeup Method
void wakeup()
134
KafkaConsumerMain Class For Kafka Consumers
Note Causes the first selection operation that has not yet returned to return
immediately.
Read about Selection in java.nio.channels.Selector's javadoc.
135
Deserializer
Deserializer
Caution FIXME
136
ConsumerConfig
ConsumerConfig
Caution FIXME
137
Consumer
Consumer
Consumer is the contract for Kafka consumers.
KafkaConsumer is the main public class that Kafka developers use to write
Note
Kafka consumers.
Consumer Contract
138
Consumer
package org.apache.kafka.clients.consumer;
void commitSync();
void commitSync(Map<TopicPartition, OffsetAndMetadata> offsets);
void commitAsync();
void commitAsync(OffsetCommitCallback callback);
void commitAsync(Map<TopicPartition, OffsetAndMetadata> offsets, OffsetCommitCallbac
k callback);
void close();
void close(long timeout, TimeUnit unit);
void wakeup();
}
139
Consumer
140
ConsumerInterceptor
ConsumerInterceptor
Example
package pl.jaceklaskowski.kafka
import java.util
onConsume Method
Caution FIXME
141
Clusters
Clusters
A Kafka cluster is the central data exchange backbone for an organization.
142
Sensor
Sensor
Sensor isFIXME
143
MetricsReporter
MetricsReporter
JmxReporter
JmxReporter is a metrics reporter that is always included in metric.reporters setting with
144
ProducerMetrics
ProducerMetrics
ProducerMetrics isFIXME
145
SenderMetrics
SenderMetrics
SenderMetrics isFIXME
146
Kafka Tools
Kafka Tools
TopicCommand
kafka.admin.TopicCommand
ConsoleProducer
kafka.tools.ConsoleProducer
ConsoleConsumer
kafka.tools.ConsoleConsumer
147
kafka-configs.sh
./bin/kafka-configs.sh \
--zookeeper localhost:2181 \
--alter \
--entity-type topics \
--entity-name test \
--add-config retention.ms=5000
148
kafka-consumer-groups.sh
149
kafka-topics.sh
150
Properties
Kafka Properties
Table 1. Properties
Default
Name Importance
Value
nonethrow an exception to
group
anything else: throw an except
authorizer.class.name
A comma-separated list of
bootstrap.servers (empty) Yes
cluster, e.g. localhost:9092
broker.rack
check.crcs
(random-
client.id
generated)
enable.auto.commit
fetch.min.bytes
fetch.max.bytes
fetch.max.wait.ms
heartbeat.interval.ms
The expected time between heartb
management facilities.
151
Properties
inter.broker.protocol.version
Comma-separated list of
interceptor.classes (empty)
props.put(ConsumerConfig
max.poll.records
metadata.max.age.ms
request.timeout.ms
sasl.enabled.mechanisms
152
Properties
import org.apache.kafka.connect.runtime.distributed.DistributedConfig
DistributedConfig.SESSION_TIMEOUT_MS_CONFIG
153
bootstrap.servers
bootstrap.servers Property
bootstrap.servers is a comma-separated list of host and port pairs that are the addresses
of the Kafka brokers in a "bootstrap" Kafka cluster that a Kafka client connects to initially to
bootstrap itself.
localhost:9092
localhost:9092,another.host:9092
bootstrap.servers provides the initial hosts that act as the starting point for a Kafka client to
Since these servers are just used for the initial connection to discover the full
cluster membership (which may change dynamically), this list does not have to
Note
contain the full set of servers (you may want more than one, though, in case a
server is down).
Use org.apache.kafka.clients.CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG
Tip
public value to refer to the property.
154
client.id
client.id Property
An optional identifier of a Kafka consumer (in a consumer group) that is passed to a Kafka
broker with every request.
The sole purpose of this is to be able to track the source of requests beyond just ip and port
by allowing a logical application name to be included in Kafka logs and monitoring
aggregates.
155
enable.auto.commit
enable.auto.commit Property
enable.auto.commit FIXME
By default, as the consumer reads messages from Kafka, it will periodically commit its
current offset (defined as the offset of the next message to be read) for the partitions it is
reading from back to Kafka. Often you would like more control over exactly when offsets are
committed. In this case you can set enable.auto.commit to false and call the commit
method on the consumer.
156
group.id
group.id Property
group.id specifies the name of the consumer group a Kafka consumer belongs to.
When the Kafka consumer is constructed and group.id does not exist yet (i.e. there are no
existing consumers that are part of the group), the consumer group will be created
automatically.
157
retry.backoff.ms
retry.backoff.ms Property
retry.backoff.ms is the time to wait before attempting to retry a failed request to a given
topic partition.
This avoids repeatedly sending requests in a tight loop under some failure scenarios.
158
Logging
Logging
Kafka Broker
A Kafka broker (started using kafka-server-start.sh ) uses config/log4j.properties for
logging configuration.
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.logger.kafka.controller.KafkaController=DEBUG, stdout
Kafka Tools
Kafka tools like kafka-console-consumer.sh (that uses KafkaConsumer under the covers)
use config/tools-log4j.properties file.
Kafka uses Simple Logging Facade for Java (SLF4J) for logging.
159
Logging
build.sbt
libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.8.0-alpha2"
Tip Replace slf4js simple binding to switch between logging frameworks (e.g. slf4j-
log4j12 for log4j).
build.sbt
val logback = "1.2.3"
libraryDependencies += "ch.qos.logback" % "logback-core" % logback
libraryDependencies += "ch.qos.logback" % "logback-classic" % logback
With logbacks configuration (as described in the above tip) you may see the following message
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.logger.org.apache.kafka.clients.consumer.ConsumerConfig=DEBUG
160
Logging
KAFKA_LOG4J_OPTS=-Dlog4j.configuration=file:[your-log4j-configuration-here]
161
WorkerGroupMember
WorkerGroupMember
Caution FIXME WorkerCoordinator? DistributedHerder?
162
ConnectDistributed
ConnectDistributed
ConnectDistributed is a command-line utility that runs Kafka Connect in distributed mode.
Caution FIXME Doh, Id rather not enter Kafka Connect yet. Not interested in it yet.
163
Gradle Tips
Gradle Tips
Building Kafka Distribution
It takes around 2 minutes (after all the dependencies were downloaded once).
164
Zookeeper Tips
Zookeeper Tips
The zookeeper shell shipped with Kafka works with no support for command line history
because jline jar is missing (see KAFKA-2385).
A solution is to use the official distribution of Apache Zookeeper (3.4.10 as of this writing)
from Apache ZooKeeper Releases.
Once downloaded, use ./bin/zkCli.sh to connect to Zookeeper that is used for Kafka.
[zk: localhost:2181(CONNECTED) 0] ls /
[cluster, controller_epoch, controller, brokers, zookeeper, admin, isr_change_notifica
tion, consumers, log_dir_event_notification, latest_producer_id_block, config]
165
Kafka in Scala REPL for Interactive Exploration
The reason for executing console command after sbt has started up is that
Note
command history did not work using the key-up and key-down keys. YMMV
build.sbt
name := "kafka-sandbox"
version := "1.0"
scalaVersion := "2.12.3"
kafka-sandbox sbt
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/jacek/dev/sandbox/kafka-sandbox/project
[info] Loading settings from build.sbt ...
[info] Set current project to kafka-sandbox (in build file:/Users/jacek/dev/sandbox/ka
fka-sandbox/)
[info] sbt server started at 127.0.0.1:4408
sbt:kafka-sandbox> console
[info] Starting scala interpreter...
Welcome to Scala 2.12.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions for evaluation. Or try :help.
166
Kafka in Scala REPL for Interactive Exploration
167
Further reading or watching
Articles
1. Apache Kafka for Beginners - an excellent article that you should start your Kafka
journey with.
168