Understanding Message Brokers
Understanding Message Brokers
Message Brokers
Learn the Mechanics of Messaging
through ActiveMQ and Kafka
Jakub Korab
Understanding
Message Brokers
Learn the Mechanics of Messaging
though ActiveMQ and Kafka
Jakub Korab
978-1-491-98153-5
[LSI]
Table of Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What Is a Messaging System, and Why Do We Need One? 2
2. ActiveMQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Connectivity 8
The Performance-Reliability Trade-off 10
Message Persistence 11
Disk Performance Factors 12
The JMS API 14
How Queues Work: A Tale of Two Brains 15
Caches, Caches Everywhere 17
Internal Contention 19
Transactions 20
Consuming Messages from a Queue 21
High Availability 26
Scaling Up and Out 28
Summary 31
3. Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Unified Destination Model 34
Consuming Messages 36
Partitioning 39
Sending Messages 40
Producer Considerations 43
Consumption Revisited 44
High Availability 48
Summary 50
iii
4. Messaging Considerations and Patterns. . . . . . . . . . . . . . . . . . . . . . . 51
Dealing with Failure 51
Preventing Duplicate Messages with Idempotent
Consumption 57
What to Consider When Looking at Messaging Technologies 58
5. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
iv | Table of Contents
CHAPTER 1
Introduction
1
Without a high-level understanding of how brokers work, people
make seemingly sensible assertions about their messaging systems
such as:
2 | Chapter 1: Introduction
As long as the two systems agree on the shape of those messages and
the way in which they will send the messages to each other, it is then
possible for them to communicate with each other without concern
for how the other system is implemented. The internals of those sys
tems, such as the programming language or the application frame
works used, can vary over time. As long as the contract itself is
maintained, then communication can continue with no change from
the other side. The two systems are effectively decoupled by that
interface.
Messaging systems typically involve the introduction of an interme
diary between the two systems that are communicating in order to
further decouple the sender from the receiver or receivers. In doing
so, the messaging system allows a sender to send a message without
knowing where the receiver is, whether it is active, or indeed how
many instances of them there are.
Lets consider a couple of analogies of the types of problems that a
messaging system addresses and introduce some basic terms.
Point-to-Point
Alexandra walks into the post office to send a parcel to Adam. She
walks up to the counter and hands the teller the parcel. The teller
places the parcel behind the counter and gives Alexandra a receipt.
Adam does not need to be at home at the moment that the parcel is
sent. Alexandra trusts that the parcel will be delivered to Adam at
some point in the future, and is free to carry on with the rest of her
day. At some point later, Adam receives the parcel.
This is an example of the point-to-point messaging domain. The post
office here acts as a distribution mechanism for parcels, guarantee
ing that each parcel will be delivered once. Using the post office sep
arates the act of sending a parcel from the delivery of the parcel.
In classical messaging systems, the point-to-point domain is imple
mented through queues. A queue acts as a first in, first out (FIFO)
buffer to which one or more consumers can subscribe. Each mes
sage is delivered to only one of the subscribed consumers. Queues will
typically attempt to distribute the messages fairly among the con
sumers. Only one consumer will receive a given message.
Queues are termed as being durable. Durability is a quality of service
that guarantees that the messaging system will retain messages in
Publish-Subscribe
Gabriella dials in to a conference call. While she is connected, she
hears everything that the speaker is saying, along with the rest of
the call participants. When she disconnects, she misses out on what
is said. On reconnecting, she continues to hear what is being said.
This is an example of the publish-subscribe messaging domain. The
conference call acts as a broadcast mechanism. The person speaking
does not care how many people are currently dialed into the call
the system guarantees that anyone who is currently dialed in will
hear what is being said.
In classical messaging systems, the publish-subscribe messaging
domain is implemented through topics. A topic provides the same
sort of broadcast facility as the conference call mechanism. When a
message is sent into a topic, it is distributed to all subscribed consum
ers.
Topics are typically nondurable. Much like the listener who does not
hear what is said on the conference call when she disconnects, topic
subscribers miss any messages that are sent while they are offline.
For this reason, it can be said that topics provide an at-most-once
delivery guarantee for each consumer.
Publish-subscribe messaging is typically used when messages are
informational in nature and the loss of a single message is not par
ticularly significant. For example, a topic might transmit tempera
ture readings from a group of sensors once every second. A system
4 | Chapter 1: Introduction
that subscribes to the topic that is interested in the current tempera
ture will not be concerned if it misses a messageanother will
arrive shortly.
Hybrid Models
A stores website places order messages onto a message queue. A
fulfilment system is the primary consumer of those messages. In
addition, an auditing system needs to have copies of these order
messages for tracking later on. Both systems cannot miss messages,
even if the systems themselves are unavailable for some time. The
website should not be aware of the other systems.
Use cases often call for a hybrid of publish-subscribe and point-to-
point messaging, such as when multiple systems each want a copy of
a message and require both durability and persistence to prevent
message loss.
These cases call for a destination (the general term for queues and
topics) that distributes messages much like a topic, such that each
message is sent to a distinct system interested in those messages, but
where each system can define multiple consumers that consume the
inbound messages, much like a queue. The consumption type in this
case is once-per-interested-party. These hybrid destinations fre
quently require durability, such that if a consumer disconnects, the
messages that are sent in the meantime are received once the con
sumer reconnects.
Hybrid models are not new and can be addressed in most messaging
systems, including both ActiveMQ (via virtual or composite destina
tions, which compose topics and queues) and Kafka (implicitly, as a
fundamental design feature of its destination).
Now that we have some basic terminology and an understanding of
why we might want to use a messaging system, lets jump into the
details.
7
The client and broker communicate with each other through an
application layer protocol, also known as a wire protocol
(Figure 2-1). The JMS specification left the details of this protocol
up to individual implementations.
Connectivity
While the API and expected behavior were well defined by JMS, the
actual protocol for communication between the client and the
broker was deliberately left out of the JMS specification, so that
existing brokers could be made JMS-compatible. As such, ActiveMQ
was free to define its own wire protocolOpenWire. OpenWire is
used by the ActiveMQ JMS client library implementation, as well as
8 | Chapter 2: ActiveMQ
its .Net and C++ counterpartsNMS and CMSwhich are sub-
projects of ActiveMQ, hosted at the Apache Software Foundation.
Over time, support for other wire protocols was added into
ActiveMQ, which increased its interoperability options from other
languages and environments:
AMQP 1.0
The Advanced Message Queuing Protocol (ISO/IEC
19464:2014) should not be confused with its 0.X predecessors,
which are implemented in other messaging systems, in particu
lar within RabbitMQ, which uses 0.9.1. AMQP 1.0 is a general
purpose binary protocol for the exchange of messages between
two peers. It does not have the notion of clients or brokers, and
includes features such as flow control, transactions, and various
qualities of service (at-most-once, at-least-once, and exactly-
once).
STOMP
Simple/Streaming Text Oriented Messaging Protocol, an easy-
to-implement protocol that has dozens of client implementa
tions across various languages.
XMPP
Extensible Messaging and Presence Protocol. Originally called
Jabber, this XML-based protocol was originally designed for
chat systems, but has been extended beyond its initial use cases
to include publish-subscribe messaging.
MQTT
A lightweight, publish-subscribe protocol (ISO/IEC
20922:2016) used for Machine-to-Machine (M2M) and Internet
of Things (IoT) applications.
ActiveMQ also supports the layering of the above protocols over
WebSockets, which enables full duplex communication between
applications in a web browser and destinations in the broker.
With this in mind, these days when we talk about ActiveMQ, we no
longer refer exclusively to a communications stack based on the
JMS/NMS/CMS libraries and the OpenWire protocol. It is becom
ing quite common to mix and match languages, platforms, and
external libraries that are best suited to the application at hand. It is
possible, for example, to have a JavaScript application running in a
browser using the Eclipse Paho MQTT library to send messages to
Connectivity | 9
ActiveMQ over Websockets, and have those messages consumed by
a C++ server process that uses AMQP via the Apache Qpid Proton
library. From this perspective, the messaging landscape is becoming
much more diverse.
Looking to the future, AMQP in particular is going to feature much
more heavily than it has to date as components that are neither cli
ents nor brokers become a more familiar part of the messaging land
scape. The Apache Qpid Dispatch Router, for example, acts as a
message router that clients connect to directly, allowing different
destinations to be handled by distinct brokers, as well as providing a
sharding facility.
When dealing with third-party libraries and external components,
you need to be aware that they are of variable quality and may not
be compatible with the features provided within ActiveMQ. As a
very simple example, it is not possible to send messages to a queue
via MQTT (without a bit of routing configured within the broker).
As such, you will need to spend some time working through the
options to determine the messaging stack most appropriate for your
application requirements.
10 | Chapter 2: ActiveMQ
sistent storage. The cost of this particular decision is unfortunately
quite high.
Consider that the difference between writing a megabyte of data to
disk is between 100 to 1000 times slower than writing it to memory.
As such, it is up to the application developer to make a decision as
to whether the price of message reliability is worth the associated
performance cost. Decisions such as these need to be made on a use
case basis.
The performance-reliability trade-off is based on a spectrum of
choices. The higher the reliability, the lower the performance. If you
decide to make the system less reliable, say by keeping messages in
memory only, your performance will increase significantly. The JMS
defaults that ActiveMQ comes tuned with out of the box favor relia
bility. There are numerous mechanisms that allow you to tune the
broker, and your interaction with it, to the position on this spectrum
that best addresses your particular messaging use cases.
This trade-off applies at the level of individual brokers. However an
individual broker is tuned, it is possible to scale messaging beyond
this point through careful consideration of message flows and sepa
ration of traffic out over multiple brokers. This can be achieved by
giving certain destinations their own brokers, or by partitioning the
overall stream of messages either at the application level or through
the use of an intermediary component. We will look more closely at
how to consider broker topologies later on.
Message Persistence
ActiveMQ comes with a number of pluggable strategies for persist
ing messages. These take the form of persistence adapters, which
can be thought of as engines for the storage of messages. These
include disk-based options such as KahaDB and LevelDB, as well as
the possibility of using a database via JDBC. As the former are most
commonly used, we will focus our discussion on those.
When persistent messages are received by a broker, they are first
written to disk into a journal. A journal is an append-only disk-
based data structure made up of multiple files. Incoming messages
are serialized into a protocol-independent object representation by
the broker and are then marshaled into a binary form, which is then
written to the end of the journal. The journal contains a log of all
Message Persistence | 11
incoming messages, as well as details of those messages that have
been acknowledged as consumed by the client.
Disk-based persistence adapters maintain index files which keep
track of where the next messages to be dispatched are positioned
within the journal. When all of the messages from a journal file have
been consumed, it will either be deleted or archived by a back
ground worker thread within ActiveMQ. If this journal is corrupted
during a broker failure, then ActiveMQ will rebuild it based on the
information within the journal files.
Messages from all queues are written in the same journal files, which
means that if a single message is unconsumed, the entire file (usually
either 32 MB or 100 MB in size by default, depending on the persis
tence adapter) cannot be cleaned up. This can cause problems with
running out of disk space over time.
12 | Chapter 2: ActiveMQ
Figure 2-2. Pipe model of disk performance
Connection
This is a long lived object that is roughly analogous to a TCP
connectiononce established, it typically lives for the lifetime
of the application until it is shut down. A connection is thread-
safe and can be worked with by multiple threads at the same
time. Connection objects allow you to create Session objects.
Session
This is a threads handle on communication with a broker. Ses
sions are not thread-safe, which means that they cannot be
accessed by multiple threads at the same time. A Session is the
main transactional handle through which the programmer may
commit and roll back messaging operations, if it is running in
14 | Chapter 2: ActiveMQ
transacted mode. Using this object, you create Message, Message
Consumer, and MessageProducer objects, as well as get handles
on Topic and Queue objects.
MessageProducer
This interface allows you to send a message to a destination.
MessageConsumer
This interface allows the developer to receive messages. There
are two mechanisms for retrieving a message:
Registering a MessageListener. This is a message handler
interface implemented by you that will sequentially process
any messages pushed by the broker using a single thread.
Polling for messages using the receive() method.
Message
This is probably the most important construct as it is the one
that carries your data. Messages in JMS are composed of two
aspects:
Metadata about the message. A message contains headers
and properties. Both of these can be thought of as entries in
a map. Headers are well-known entries, specified by the
JMS specification and accessible directly via the API, such
as JMSDestination and JMSTimestamp. Properties are arbi
trary pieces of information about the message that you set
to simplify message processing or routing, without the need
to read the message payload itself. You may, for instance, set
an AccountID or OrderType header.
The body of the message. A number of different message
types can be created from a Session, based on the type of
content that will be sent in the body, the most common
being TextMessage for strings and BytesMessage for binary
data.
16 | Chapter 2: ActiveMQ
Once the broker is satisfied that the message has been stored, it
responds with an acknowledgement back to the client (4). The client
thread that originally invoked the send() operation is then free to
continue performing its processing.
This waiting for acknowledgement of persistent messages is funda
mental to the guarantee that the JMS API providesif you want the
message to be persisted, you presumably also care about whether the
message was accepted by the broker in the first place. There are a
number of reasons why this might not be possible, for instance, a
memory or storage limit being reached. Instead of crashing, the
broker will either pause the send operation, causing the producer to
wait until there are enough system resources to process the message
(a process called Producer Flow Control), or it will send a negative
acknowledgement back to the producer, triggering an exception to
be thrown. The exact behavior is configurable on a per-broker basis.
There is a substantial amount of I/O interaction happening in this
simple operation, with two network operations between the pro
ducer and the broker, one storage operation, and a confirmation
step. The storage operation could be a simple disk write or another
network hop to a storage server.
This raises an important point about message brokers: they are
extremely I/O intensive and very sensitive to the underlying infra
structure, in particular, disks.
Lets take a closer look at the confirmation step (3) in the above
interaction. If the persistence adapter is file based, then storing a
message involves a write to the filesystem. If this is the case, then
why would we need a confirmation that a write has been completed?
Surely the act of completing a write means that a write has occur
red?
Not quite. As tends to be the case with these things, the closer you
look at a something, the more complex it turns out to be. The culprit
in this particular case is caches.
18 | Chapter 2: ActiveMQ
This syncing behavior is a JMS requirement to ensure that all mes
sages that are marked as persistent are actually saved to disk, and is
therefore performed after the receipt of each message or set of
related messages in a transaction. As such, the speed with which the
disk can sync() is of critical importance to the performance of the
broker.
Internal Contention
The use of a single journal for all queues adds an additional compli
cation. At any given time, there may be multiple producers all send
ing messages. Within the broker, there are multiple threads that
receive these messages from the inbound socket connections. Each
thread needs to persist its message to the journal. As it is not possi
ble for multiple threads to write to the same file at the same time
because the writes would conflict with each other, the writes need to
be queued up through the use of a mutual exclusion mechanism. We
call this thread contention.
Each message must be fully written and synced before the next mes
sage can be processed. This limitation impacts all queues in the
broker at the same. So the performance of how quickly a message
can be accepted is the write time to disk, plus any time waiting on
other threads to complete their writes.
ActiveMQ includes a write buffer into which receiving threads write
their messages while they are waiting for the previous write to com
plete. The buffer is then written in one operation the next time the
message is available. Once completed, the threads are then notified.
In this way, the broker maximizes the use of the storage bandwidth.
To minimize the impact of thread contention, it is possible to assign
sets of queues to their own journals through the use of the mKa
haDB adapter. This approach reduces the wait times for writes, as at
any one time threads will likely be writing to different journals and
will not need to compete with each other for exclusive access to any
one journal file.
Internal Contention | 19
Transactions
The advantage of using a single journal for all queues is that from
the broker authors perspective it is much simpler to implement
transactions.
Let us consider an example where multiple messages are sent from a
producer to multiple queues. Using a transaction means that the
entire set of sends must be treated as a single atomic operation. In
this interaction, the ActiveMQ client library is able to make some
optimizations which greatly increase send performance.
In the operation shown in Figure 2-4, the producer sends three mes
sages, all to different queues. Instead of the normal interaction with
the broker, where each message is acknowledged, the client sends all
three messages asynchronously, that is, without waiting for a
response. These messages are held in memory by the broker. Once
the operation is completed, the producer tells its session to commit,
which in turn causes the broker to perform a single large write with
a single sync operation.
20 | Chapter 2: ActiveMQ
If you were to compare this with a situation where each queue was
stored in its own journal, then the broker would need to ensure
some form of transactional coordination between each of the writes.
22 | Chapter 2: ActiveMQ
network traffic. However, should the client system shut down,
then the acknowledgements will be lost and the messages will be
re-dispatched and processed a second time. The code must
therefore deal with the likelihood of duplicate messages.
Acknowledgement modes are supplemented by a transactional con
sumption facility. When a Session is created, it may be flagged as
being transacted. This means that it is up to the programmer to
explicitly call Session.commit() or Session.rollback(). On the
consumption side, transactions expand the range of interactions that
the code can perform as a single atomic operation. For example, it is
possible to consume and process multiple messages as a single unit,
or to consume a message from one queue and then send to another
queue using the same Session.
If the client terminates between step 2 and step 3, then the con
sumption of the message has already affected some other system
Message Ordering
Given a set of messages that arrive in the order [A, B, C, D], and
two consumers C1 and C2, the normal distribution of messages will
be as follows:
C1: [A, C]
C2: [B, D]
Since the broker has no control over the performance of consumer
processes, and since the order of processing is concurrent, it is non-
deterministic. If C1 is slower than C2, the original set of messages
could be processed as [B, D, A, C].
This behavior can be surprising to newcomers, who expect that
messages will be processed in order, and who design their messaging
application on this basis. The requirement for messages that were
sent by the same sender to be processed in order relative to each
other, also known as causal ordering is quite common.
Take as an example the following use case taken from online betting:
24 | Chapter 2: ActiveMQ
from an account that had no funds. There are, of course, ways to get
around this.
The exclusive consumer model involves dispatching all messages
from a queue to a single consumer. Using this approach, when mul
tiple application instances or threads connect to a queue, they sub
scribe with a specific destination option: my.queue?
consumer.exclusive=true. When an exclusive consumer is con
nected, it receives all of the messages. When a second consumer
connects, it receives no messages until the first one disconnects.
This second consumer is effectively a warm-standby, while the first
consumer will now receive messages in the exact same order as they
were written to the journalin causal order.
The downside of this approach is that while the processing of mes
sages is sequential, it is a performance bottleneck as all messages
must be processed by a single consumer.
To address this type of use case in a more intelligent way, we need to
re-examine the problem. Do all of the messages need to be pro
cessed in order? In the betting use case above, only the messages
related to a single account need to be sequentially processed.
ActiveMQ provides a mechanism for dealing with this situation,
called JMS message groups.
Message groups are a type of partitioning mechanism that allows
producers to categorize messages into groups that will be sequen
tially processed according to a business key. This business key is set
into a message property named JMSXGroupID.
The natural key to use in the betting use case would be the account
ID.
To illustrate how dispatch works, consider a set of messages that
arrive in the following order:
[(A, Group1), (B, Group1), (C, Group2), (D, Group3), (E, Group2)]
When a message is processed by the dispatching mechanism in
ActiveMQ, and it sees a JMSXGroupID that has not previously been
seen, that key is assigned to a consumer on a round-robin basis.
From that point on, all messages with that key will be sent to that
consumer.
Here, the groups will be assigned between two consumers, C1 and
C2, as follows:
High Availability
ActiveMQ provides high availability through a master-slave scheme
based on shared storage. In this arrangement, two or more (though
usually two) brokers are set up on separate servers with their mes
sages being persisted to a message store located at an external loca
tion. The message store cannot be used by multiple broker instances
at the same time, so its secondary function is to act as a locking
mechanism to determine which broker gets exclusive access
(Figure 2-6).
The first broker to connect to the store (Broker A) takes on the role
of the master and opens its ports to messaging traffic. When a sec
ond broker (Broker B) connects to the store, it attempts to acquire
26 | Chapter 2: ActiveMQ
the lock, and as it is unable to, pauses for a short period before
attempting to acquire the lock again. This is known as holding back
in a slave state.
It the meantime, the client alternates between the addresses of two
brokers in an attempt to connect to an inbound port, known as the
transport connector. Once a master broker is available, the client
connects to its port and can produce and consume messages.
When Broker A, which has held the role of the master, fails due to a
process outage (Figure 2-7), the following events occur:
High Availability | 27
Logic to alternate between multiple broker addresses is
not guaranteed to be built into the the client library, as
it is with the JMS/NMS/CMS implementations. If a
library provides only reconnection to a single address,
then it may be necessary to place the broker pair
behind a load balancer, which also needs to be made
highly available.
28 | Chapter 2: ActiveMQ
There are a number of ways to extract more performance out of a
broker infrastructure:
Do not use persistence unless you need to. Some use cases toler
ate message loss on failure, especially ones when one system
feeds full snapshot state to another over a queue, either periodi
cally or on request.
Run the broker on faster disks. In the field, significant differ
ences in write throughput have been seen between standard
HDDs and memory-based alternatives.
Make better use of disk dimensions. As shown in the pipe
model of disk interaction outlined earlier, it is possible to get
better throughput by using transactions to send groups of mes
sages, thereby combining multiple writes into a larger one.
Use traffic partitioning. It is possible to get better throughput by
splitting destinations over one of the following:
Multiple disks within the one broker, such as by using the
mKahaDB persistance adapter over multiple directories with
each mounted to a different disk.
Multiple brokers, with the partitioning of traffic performed
manually by the client application. ActiveMQ does not pro
vide any native features for this purpose.
Once this is established, you can drill down into the details to
answer questions such as:
30 | Chapter 2: ActiveMQ
How big are the messages on each destination? Large messages
can cause issues in the paging process, leading to memory limits
being hit and blocking the broker.
Are the message flows going to be uniform over the course of
the day, or are there bursts due to batch jobs? Large bursts on
one less-used queue might interfere with timely disk writes for
high-throughput destinations.
Are the systems in the same data center or different ones?
Remote communication implies some form of broker network
ing.
Summary
In this chapter we have examined the mechanics by which
ActiveMQ receives and distributes messages. We discussed features
that are enabled by this architecture, including sticky load-balancing
of related messages and transactions. In doing so we introduced a
set of concepts common to all messaging systems, including wire
protocols and journals. We also looked in some detail at the com
plexities of writing to disk and how brokers can make use of techni
ques such as batching writes in order to increase performance.
Finally, we examined how ActiveMQ can be made highly available,
Summary | 31
and how it can be scaled beyond the capacity of an individual
broker.
In the next chapter we will take a look at Apache Kafka and how its
architecture reimagines the relationship between clients and brokers
to provide an incredibly resilient messaging pipeline with many
times greater throughput than a regular message broker. We will dis
cuss the functionality that it trades off to achieve this, and examine
in brief the application architectures that it enables.
32 | Chapter 2: ActiveMQ
CHAPTER 3
Kafka
Be extremely fast
Allow massive message throughput
Support publish-subscribe as well as point-to-point
Not slow down with the addition of consumers; both queue and
topic performance degrades in ActiveMQ as the number of con
sumers rise on a destination
Be horizontally scalable; if a single broker that persists messages
can only do so at the maximum rate of the disk, it makes sense
that to exceed this you need to go beyond a single broker
instance
Permit the retention and replay of messages
33
In order to achieve all of this, Kafka adopted an architecture that
redefined the roles and responsibilities of messaging clients and
brokers. The JMS model is very broker-centric, where the broker is
responsible for the distribution of messages, and clients only have to
worry about sending and receiving messages. Kafka, on the other
hand, is client-centric, with the client taking over many of the func
tions of a traditional broker, such as fair distribution of related mes
sages to consumers, in return for an extremely fast and scalable
broker. To people coming from a traditional messaging background,
working with Kafka requires a fundamental shift in perspective.
This engineering direction resulted in a messaging infrastructure
that is capable of many orders of magnitude higher throughput than
a regular broker. As we will see, this approach comes with trade-offs
than meant that Kafka is not suitable for certain types of workloads
and installations.
34 | Chapter 3: Kafka
tained within have been read or not. It is a central part of Kafkas
design that the broker is not concerned with whether its messages
are consumedthat responsibility belongs to the client.
Consuming Messages
A client that wants to consume messages controls a named pointer,
called a consumer group, that points to a message offset in a partition.
The offset is an incrementally numbered position that starts at 0 at
the beginning of the partition. This consumer group, referenced in
the API through a user-defined group_id, corresponds to a single
logical consumer or system.
Most systems that use messaging consume from a destination via
multiple instances and threads in order to process messages in par
allel. As such, there will typically be many consumer instances shar
ing the same consumer group.
The problem of consumption can be broken down as follows:
36 | Chapter 3: Kafka
When a consumer instance connects with its own group_id to this
topic, it is assigned a partition to consume from and an offset in that
partition. The position of this offset is configurable within the client
as either pointing to the latest position (the newest message) or the
earliest (the oldest message). The consumer polls for messages from
the topic, which results in reading them sequentially from the log.
The offset position is regularly committed back to Kafka and stored
as messages on an internal topic in __consumer_offsets. The con
sumed messages are not deleted in any way, unlike a regular broker,
and the client is free to rewind the offset to reprocess messages that
it has already seen.
When a second logical consumer connects with a different
group_id, it controls a second pointer that is independent of the
first (Figure 3-3). As such, a Kafka topic acts like a queue where a
single consumer exists, and like a regular pub-sub topic where mul
tiple consumers are subscribed, with the added advantage that all
messages are persisted and can be processed multiple times.
Consuming Messages | 37
Figure 3-4. Two consumers in the same consumer group reading from
the same partition
38 | Chapter 3: Kafka
Partitioning
Partitions are the primary mechanism for parallelizing consumption
and scaling a topic beyond the throughput limits of a single broker
instance. To get a better understanding of this, lets now consider the
situation where there exists a topic with two partitions, and a single
consumer subscribes to that topic (Figure 3-5).
Partitioning | 39
Figure 3-6. Two consumers in the same group reading from different
partitions
Sending Messages
The responsibility for deciding which partition to send a message to
is assigned to the producer of that message. To understand the
mechanism by which this is done, we first need to consider what it is
that we are actually sending.
40 | Chapter 3: Kafka
While in JMS, we make use of a message construct with metadata
(headers and properties) and a body containing the payload; in
Kafka the message is a key-value pair. The payload of the message is
sent as the value. The key, on the other hand, is used primarily for
partitioning purposes and should contain a business-specific key in
order to place related messages on the same partition.
In Chapter 2 we discussed a use case from online betting where
related events need to be processed in order by a single consumer:
If each event is a message sent to a topic, then the natural key in this
case would be the account ID.
When a message is sent using the Kafka Producer API, it is handed
to a partitioning function that, given the message and the current
state of the Kafka cluster, returns a partition ID to which the mes
sage should be sent. This function is implemented through the
Partitioner interface in Java.
This interface looks as follows:
interface Partitioner {
int partition(String topic,
Object key, byte[] keyBytes,
Object value, byte[] valueBytes,
Cluster cluster);
}
The default implementation of the Partitioner uses a general-
purpose hashing algorithm over the key, or round-robin if no key is
provided, to determine the partition. This default works well in the
majority of cases. There will, however, be times when you want to
write your own.
Sending Messages | 41
instruction. In such a case, the sending and receiving systems agree
to use a signature to verify the authenticity of the message.
In regular JMS, we would simply define a signature message prop
erty and append it to the message. However, Kafka does not provide
us with a mechanism to transmit metadataonly a key and a value.
Since the value is the bank transfer payload whose integrity we want
to preserve, we are left with no alternative other than defining a data
structure for use within the key. Assuming that we need an account
ID for partitioning, as all messages relating to an account must be
processed in order, we come up with the following JSON structure:
{
"signature": "541661622185851c248b41bf0cea7ad0",
"accountId": "10007865234"
}
42 | Chapter 3: Kafka
It is important to note that however you decide to partition the mes
sages, the partitioner itself may need to be reused.
Consider the requirement to replicate data between Kafka clusters in
different geographical locations. Kafka comes with a standalone
command-line tool called MirrorMaker for this purpose, used to
consume messages from one cluster and produce them into another.
MirrorMaker needs to understand the keys of the topic being repli
cated in order to maintain relative ordering between messages as it
replicates between clusters, as the number of partitions for that topic
may not be the same in the two clusters.
Custom partitioning strategies are relatively rare, as the defaults of
hashing or round-robin work for the majority of use cases. If, how
ever, you require strong ordering guarantees or need to move meta
data outside of payloads, then partitioning is something that you
will need to consider in closer detail.
Kafkas scalability and performance benefits come from moving
some of the responsibilities of a traditional broker onto the client. In
this case, the decision around distribution of potentially related
messages to multiple consumers running in parallel.
Producer Considerations
Partitioning is not the only thing that needs to be considered when
sending messages. Let us consider the send() methods on the
Producer class in the Java API:
Future<RecordMetadata> send(ProducerRecord<K,V> record);
Future<RecordMetadata> send(ProducerRecord<K,V> record,
Callback callback);
Producer Considerations | 43
The immediate thing to note is that both methods return a Future,
which indicates that the send operation is not performed immedi
ately. What happens is that the message (ProducerRecord) is written
into a send buffer for each active partition and transmitted on to the
broker by a background thread within the Kafka client library.
While this makes the operation incredibly fast, it does mean that a
naively written application could lose messages if its process goes
down.
As always there is a way of making the sending operation more reli
able at the cost of performance. The size of this buffer can be tuned
to be 0, and the sending application thread can be forced to wait
until the transmission of the message to the broker has been com
pleted, as follows:
RecordMetadata metadata = producer.send(record).get();
Consumption Revisited
The consumption of messages has additional complexities that need
to be reasoned about. Unlike the JMS API, which can trigger a mes
sage listener in reaction to the arrival of a message, Kafkas Consumer
interface is polling only. Lets take a closer look at the poll()
method used for this purpose:
ConsumerRecords<K,V> poll(long timeout);
The methods return value is a container structure containing multi
ple ConsumerRecord objects from potentially multiple partitions.
The ConsumerRecord itself is a holder object for a key-value pair,
with associated metadata such as which partition it came from.
As discussed in Chapter 2, we need to constantly keep in mind what
happens to messages once they are either successfully processed or
not, such as if the client is unable to process a message or if it termi
nates. In JMS, this was handled through the acknowledgement
mode. The broker would either delete a successfully processed mes
sage or redeliver an unprocessed or failed one (assuming transac
tions were in play). Kafka works quite differently. Messages are not
deleted on the broker once consumed, and the responsibility for
working out what happens on failure lies with the consuming code
itself.
44 | Chapter 3: Kafka
As we have discussed, a consumer group is associated with an offset
in the log. The position in the log associated with that offset corre
sponds to the next message to be handed out in response to a
poll(). What is critical in consumption is the timing around when
that offset is incremented.
Going back to the consumption model discussed earlier, the pro
cessing of a message has three phases:
Consumption Revisited | 45
This mode also has the same potential as in 0.9: messages may have
been processed, but in the event of failure, the offset may not have
been committed, leading to potential duplicate delivery. The more
messages you retrieve during a poll(), the greater this problem is.
As discussed in Consuming Messages from a Queue on page 21,
there is no such thing as once-only message delivery in a messaging
system once you take failure modes into account.
In Kafka, there are two ways to commit the offset: automatically, and
manually. In both cases, you may process the messages multiple
times if you have processed the message but failed before commit
ing. You may also not process the message at all if the commit hap
pened in the background and your code terminated before it got
around to processing (a possibility in Kafka 0.9 and earlier).
Controlling the offset commit process manually is enabled in the
Kafka consumer API by setting the enable.auto.commit to false,
and calling one of the following methods explicitly:
void commitSync();
void commitAsync();
If you care about at-least-once processing, you would commit the
offset manually via commitSync(), executed immediately after your
processing of messages.
These methods prevent messages from being acknowledged before
being processed, but do nothing to address the potential for dupli
cate processing, while at the same time giving the impression of
transactionality. Kafka is nontransactional. There is no way for a cli
ent to:
46 | Chapter 3: Kafka
ture depends on many independent machines working as a sin
gle bus, and no attempt is made to hide this. For instance, there
is no API construct that would enable the tying together of a
Consumer and Producer in a transaction; in JMS this is medi
ated by the Session from which MessageProducers and Messa
geConsumers are created.
Consumption Revisited | 47
High Availability
Kafkas approach to high availability is significantly different from
that of ActiveMQ. Kafka is designed around horizontally scalable
clusters in which all broker instances accept and distribute messages
at the same time.
A Kafka cluster is made up of multiple broker instances running on
separate servers. Kafka has been designed to run on commodity
standalone hardware, with each node having its own dedicated stor
age. The use of storage area networks (SANs) is discouraged as
many compute nodes may compete for time slices of the storage and
create contention.
Kafka is an always-on system. A lot of large Kafka users never take
their clusters down, and the software always provides an upgrade
path via a rolling restart. This is managed by guaranteeing compati
bility with the previous version for messages and inter-broker com
munication.
Brokers are connected to a cluster of ZooKeeper servers which acts
as a registry of configuration information and is used to coordinate
the roles of each broker. ZooKeeper is itself a distributed system,
that provides high availability through replication of information in
a quorum setup.
At its most basic, a topic is created on the Kafka cluster with the fol
lowing properties:
48 | Chapter 3: Kafka
node that contains the logs for the partition is refered to as a replica.
A broker may act as a leader for some partitions and as a follower
for others.
A follower that contains all of the messages held by the leader is
referred to as being an in-sync replica. Should the broker acting as
leader for a partition go offline, any broker that is up-to-date or in-
sync for that partition may take over as leader. This is an incredibly
resilient design.
Part of the producers configuration is an acks setting, which will
dictate how long many replicas must acknowledge receipt of the
message before the application thread continues when it is sent: 0, 1,
or all. If set to all, on receipt of a message, the leader will send
confirmation back to the producer once it has received acknowl
edgements of the write from a number of replicas (including itself),
defined by the topics min.insync.replicas setting (1 by default). If
a message cannot be successfully replicated, then the producer will
raise an exception to the application (NotEnoughReplicas or
NotEnoughReplicasAfterAppend).
A typical configuration is to create a topic with a replication factor
of 3 (1 leader, 2 followers for each partition) and set
min.insync.replicas to 2. That way the cluster will tolerate one of
the brokers managing a topic partition going offline with no impact
on client applications.
This brings us back to the now-familiar performance versus reliabil
ity trade-off. Replication comes at the cost of additional time wait
ing for acknowledgments from followers; although as it is
performed in parallel, replication to a minimum of three nodes has
similar performance as that of two (ignoring the increased network
bandwidth usage).
Using this replication scheme, Kafka cleverly avoids the need to
ensure that every message is physically written to disk via a sync()
operation. Each message sent by a producer will be written to the
partitions log, but as discussed in Chapter 2, a write to a file is ini
tially performed into an operating system buffer. If that message is
replicated to another Kafka instance and resides in its memory, loss
of the leader does not mean that the message itself was lostthe in-
sync replica can take over.
High Availability | 49
Avoiding the need to sync() means that Kafka can accept messages
at the rate at which it can write into memory. Conversely, the longer
it can avoid flushing its memory to disk, the better. For this reason it
is not unusual to see Kafka brokers assigned 64 GB of memory or
more. This use of memory means that a single Kafka instance can
easily operate at speeds many thousands of times faster than a tradi
tional message broker.
Kafka can also be configured to sync() batches of messages. As
everything in Kafka is geared around batching, this actually per
forms quite well for many use cases and is a useful tool for users that
require very strong guarantees. Much of Kafkas raw performance
comes from messages that are sent to the broker as batches, and
from having those messages read from the broker in sequential
blocks via zero-copy. The latter is a big win from a performance and
resource perspective, and is only possible due to the use of the
underlying journal data structure, which is laid out per partition.
Much higher performance is possible across a Kafka cluster than
through the use of a single Kafka broker, as a topics partitions may
be horizontally scaled over many separate machines.
Summary
In this chapter we looked at how Kafkas architecture reimagines the
relationship between clients and brokers to provide an incredibly
resilient messaging pipeline with many times greater throughput
than a regular message broker. We discussed the functionality that it
trades off to achieve this, and examined in brief the application
architectures that it enables. In the next chapter we will look at com
mon concerns that messaging-based applications need to deal with
and discuss strategies for addressing them. We will complete the
chapter by outlining how to reason about messaging technologies in
general so that you can evaluate their suitability to your use cases.
50 | Chapter 3: Kafka
CHAPTER 4
Messaging Considerations
and Patterns
51
Reconnection involves cycling through the set of known addresses
for a broker, with delays in-between. The exact details vary between
client libraries.
While a broker is unavailable, the application thread performing the
send may be blocked from performing any additional work if the
send operation is synchronous. This can be problematic if that
thread is reacting to outside stimuli, such as responding to a web
service request.
If all of the threads in a web servers thread pool are suspended
while they are attempting to communicate with a broker, the server
will begin rejecting requests back to upstream systems, typically
with HTTP 503 Service Unavailable. This situation is referred to
as back-pressure, and is nontrivial to address.
One possibility for ensuring that an unreachable broker does not
exhaust an applications resource pool is to implement the circuit
breaker pattern around messaging code (Figure 4-1). At a high level,
a circuit breaker is a piece of logic around a method call that is used
to prevent threads from accessing a remote resource, such as a
broker, in response to application-defined exceptions.
If you are intending to send large messages, check how the system
deals with them. Do you need some form of additional external
storage outside of the messaging system, such as when using the
Claim Check pattern? Or is there some form of built-in support for
streaming? If streaming very large content like video, do you need
persistence at all?
Do you need low latency? If so, how low? Different business
domains will have different views on this. Intermediary systems
such as brokers add processing time between production and con
sumptionperhaps you should consider brokerless options such as
ZeroMQ or an AMQP routing setup?
Consider the interaction between messaging counterparties. Are you
going to be performing request-response over messaging? Does the
messaging system support the sorts of constructs that are required,
i.e., message headers, temporary destinations, and selectors?
63
smaller maximum disk requirements than Kafkagigabytes versus
terabytes.
Kafkas log-based design means that messages are not deleted when
consumed, and as such can be processed many times. This enables a
completely different category of applications to be builtones
which can consider the messaging layer as a source of historical data
and can use it to build application state.
ActiveMQs requirements lead to a design that is limited by the per
formance of its storage and relies on a high-availability mechanism
requiring multiple servers, of which some are not in use while in
slave mode. Where messages are physically located matters a lot
more than it does in Kafka. To provide horizontal scalability, you
need to wire brokers together into store-and-forward networks,
then worry about which one is responsible for messages at any given
point in time.
Kafka requires a much more involved system involving a ZooKeeper
cluster and requires an understanding of how applications will make
use of the system (e.g., how many consumers will exist on each
topic) before it is configured. It relies upon the client code taking
over the responsibility of guaranteeing ordering of related messages,
and correct management of consumer group offsets while dealing
with messages failures.
Do not believe the myth of a magical messaging fabrica system
that will solve all problems in all operating environments. As with
any technology area there are trade-offs, even within messaging sys
tems in the same general category. These trade-offs will quite often
impact how your applications are designed and written.
Your choices in this area should be led first and foremost by a good
understanding of your own use cases, desired design outcomes, and
target operating environment. Spend some time looking into the
details of a messaging product before jumping in. Ask questions:
64 | Chapter 5: Conclusion
I hope that this book has given you an appreciation of some of the
mechanics and trade-offs of broker-based messaging systems, and
will help you to consider these products in an informed way. Happy
messaging!
Conclusion | 65
About the Author
Jakub Korab is a UK-based specialist messaging and integration
consultant who runs his own consultancy, Ameliant. Over the past
six years he has worked with over 100 clients around the world to
design, develop, and troubleshoot large, multisystem integrations
and messaging installations using a set of open source tools from the
Apache Software Foundation. His experience has spanned industries
including finance, shipping, logistics, aviation, industrial IoT, and
space exploration. He is coauthor of the Apache Camel Developers
Cookbook (Packt, 2013), and an international speaker who has pre
sented at conferences across Europe, including Devoxx UK, Java
Zone (Norway), Voxxed Days (Bristol, Belgrade, and Ticino),
DevWeek, and OReilly SACon. Prior to going independent, he
worked for the company behind ActiveMQFuseSource, later
acquired by RedHatand is currently partnering with Confluent,
the company behind Kafka.