31535
31535
com
https://ebookgate.com/product/kafka-streams-in-action-
second-edition-meap-v13-bill-bejeck/
OR CLICK HERE
DOWLOAD NOW
https://ebookgate.com/product/scylladb-in-action-meap-v05-bo-ingram/
ebookgate.com
https://ebookgate.com/product/duckdb-in-action-meap-v02-mark-needham/
ebookgate.com
https://ebookgate.com/product/c-concurrency-in-action-practical-
multithreading-meap-1st-edition-anthony-williams/
ebookgate.com
https://ebookgate.com/product/bayesian-optimization-in-action-
meap-v07-1st-chapters-1-to-8-of-13-edition-quan-nguyen/
ebookgate.com
Kafka translated how translators have shaped our reading
of Kafka 1. publ Edition Kafka
https://ebookgate.com/product/kafka-translated-how-translators-have-
shaped-our-reading-of-kafka-1-publ-edition-kafka/
ebookgate.com
https://ebookgate.com/product/google-anthos-in-action-manage-hybrid-
and-multi-cloud-kubernetes-clusters-meap-v11-antonio-gulli-et-al/
ebookgate.com
https://ebookgate.com/product/deep-learning-with-pytorch-second-
edition-meap-v03-howard-huang/
ebookgate.com
https://ebookgate.com/product/ejb-3-in-action-second-edition-debu-
panda/
ebookgate.com
https://ebookgate.com/product/introduction-to-documentary-second-
edition-bill-nichols/
ebookgate.com
Kafka Streams in Action, Second Edition
1. About_this_book
2. Acknowledgements
3. Preface
4. PART_1:_INTRODUCTION
5. 1_Welcome_to_the_Kafka_Event_Streaming_Platform
6. 2_Kafka_Brokers
7. PART_2:_GETTING_DATA_INTO_KAFKA
8. 3_Schema_Registry
9. 4_Kafka_Clients
10. 5_Kafka_Connect
11. PART_3:_EVENT_STREAM_PROCESSING_DEVELOPMENT
12. 6_Developing_Kafka_Streams
13. 7_Streams_and_State
14. 8_The_KTable_API
15. 9_Windowing_and_Timestamps
16. 10_The_Processor_API
17. 11_ksqlDB
18. 12_Spring_Kafka
19. 13_Kafka_Streams_interactive_queries
20. 14_Testing
21. Appendix_A._Schema_Compatibility_Workshop
22. Appendix_B._Working_with_Avro,_Protobuf_and_JSON_Schema
23. Appendix_C._Understanding_Kafka_Streams_architecture
24. Appendix_D._Confluent_Resources
25. index
About this book
I wrote the 2nd edition of Kafka Streams in Action to teach you how to build
event streaming applications in Kafka Streams and include other components
of the Kafka ecosystem, Producer and Consumer clients, Connect, and
Schema Registry. I took this approach because for your event-streaming
application to be as effective as possible, you’ll need not just Kafka Streams
but other essential tools. My approach to writing this book is a pair-
programming perspective; I imagine myself sitting next to you as you write
the code and learn the API. You’ll learn about the Kafka broker and how the
producer and consumer clients work. Then, you’ll see how to manage
schemas, their role with Schema Registry, and how Kafka Connect bridges
external components and Kafka. From there, you’ll dive into Kafka Streams,
first building a simple application, then adding more complexity as you dig
deeper into Kafka Streams API. You’ll also learn about ksqlDB, testing, and,
finally, integrating Kafka with the popular Spring framework.
Part 1 introduces event streaming and describes the different parts of the
Kafka ecosystem to show you the big-picture view of how it all works and
fits together. These chapters also provide the basics of the Kafka broker for
those who need them or want a review:
Part 2 moves on and covers getting data into and out of Kafka and managing
schemas: . Chapter 3 covers using Schema Registry to help you manage the
evolution of your data’s schemas. Spoiler alert: you’re always using a
schema-if not explicitly, then it’s implicitly there. . Chapter 4 discusses the
Kafka producer and consumer clients. The clients are how you get data into
and out of Kafka and provide the building blocks for Kafka Connect and
Kafka Streams. . Chapter 5 is about Kafka Connect. Kafka Connect provides
the ability to get data into Kafka via source connectors and export it to
external systems with sink connectors.
Part 3 gets to the book’s heart and covers developing Kafka Streams
applications. In this section, you’ll also learn about ksqlDB and testing your
event-streaming application, and it concludes with integrating Kafka with the
Spring Framework . Chapter 6 is your introduction to Kafka Streams, where
you’ll build a Hello World application and, from there, build a more realistic
application for a fictional retailer. Along the way, you’ll learn about the
Kafka Streams DSL. . Chapter 7 continues your Kafka Streams learning path,
where we discuss application state and why it’s required for streaming
applications. In this chapter, some of the things you’ll learn about are
aggregating data and joins. . Chapter 8: You’ll learn about the KTable API.
Whereas a KStream is a stream of events, a KTable is a stream of related
events or an update stream. . Chapter 9 covers windowed operations and
timestamps. Windowing an aggregation allows you to bucket results by time,
and the timestamps on the records drive the action. . Chapter 10 dives into the
Kafka Streams Processor API. Up to this point, you’ve been working with the
high-level DSL, but here, you’ll learn how to use the Processor API when
you need more control. . Chapter 11 takes you further into the development
stack, where you’ll learn about ksqlDB. ksqlDB allows you to write event-
streaming applications without any code but using SQL. . Chapter 12
discusses using the Spring Framework with Kafka clients and Kafka Streams.
Spring allows you to write more modular and testable code by providing a
dependency injection framework for wiring up your applications. . Chapter
13 introduces you to Kafka Streams Interactive Queries or IQ. IQ is the
ability to directly query the state store of a state operation in Kafka Streams.
You’ll use what you learned in Chapter 12 to build a Spring-enabled IQ web
application. . Chapter 14 covers the all-important topic of testing. You’ll
learn how to test client applications with a Kafka Streams topology, the
difference between unit testing and integration testing, and when to apply
them. . Appendix A contains a workshop on Schema Registry to get hands-on
experience with the different schema compatibility modes. . Appendix B is a
survey of working with the different schema types Avro, Protobuf, and JSON
Schema. . Appendix C covers the architecture and internals of Kafka Streams.
. Appendix D presents information on using Confluent Cloud to help develop
your event streaming applications.
In many cases, the original source code has been reformatted; we’ve added
linebreaks and reworked indentation to accommodate the available page
space in the book. In rare cases, even this was not enough, and listings
include line-continuationmarkers (➥).
Additionally, comments in the source code have often been removed from the
list-ings when the code is described in the text. Code annotations accompany
many of thelistings, highlighting important concepts.
Finally, it’s important to note that many of the code examples aren’t meant
tostand on their own: they’re excerpts containing only the most relevant parts
of what is currently under discussion. You’ll find all the examples from the
book in the accompanying source code in their complete form.
Last but certainly not least, I thank the reviewers for their hard work and
invaluable feedback in making the quality of this book better for all readers.
Preface
After completing the first edition of Kafka Streams in Action, I thought that I
had accomplished everything I had set out to do. But as time went on, my
understanding of the Kafka ecosystem and my appreciation for Kafka
Streams grew. I saw that Kafka Streams was more powerful than I had
initially thought. Additionally, I noticed other important pieces in building
event-streaming applications; Kafka Streams is still a key player but not the
only requirement. I realized that Apache Kafka could be considered the
central nervous system for an organization’s data. If Kafka is the central
nervous system, then Kafka Streams is a vital organ performing some
necessary operations.
But Kafka Streams relies on other components to bring events into Kafka or
export them to the outside world where its results and calculations can be put
to good use. I’m talking about the producer and consumer clients and Kafka
Connect. As I put the pieces together, I realized you need these other
components to complete the event-streaming picture. Couple all this with
some significant improvements to Kafka Streams since 2018, and I knew I
wanted to write a second edition.
But I didn’t just want to brush up on the previous edition; I wanted to express
my improved understanding and add complete coverage of the entire Kafka
ecosystem. This meant expanding the scope of some subjects from sections
of chapters to whole chapters (like the producer and consumer clients), or
adding entirely new chapters (such as the new chapters on Connect and
Schema Registry). For the existing Kafka Streams chapters, writing a second
edition meant updating and improving the existing material to clarify and
communicate my deeper understanding.
Taking on the second edition with this new focus during the pandemic was
not easy and not without some serious personal challenges along the way. But
in the end, it was worth every minute of it, and if I were to go back in time, I
would make the same decision. I hope that new readers of Kafka Streams in
Action will find the book an essential resource and that readers from the first
edition will enjoy and apply the improvements as well.
PART 1: INTRODUCTION
In part one, you’ll learn about events and event streaming in general. Event
streaming is a software development approach that considers events as an
application’s primary input and output. But to develop an effective event
streaming application, you’ll first need to learn what an event is (spoiler alert:
it’s everything!). Then you’ll read about what use cases are good candidates
for event-streaming applications and which are not.
First, you’ll discover what a Kafka broker is and how it’s at the heart of the
Kafka ecosystem, and the various jobs it performs. Then you’ll learn what
Schema Registry, producer and consumer clients, Connect, and Kafka
Streams are and their different roles. Then you’ll learn about the Apache
Kafka event streaming platform; although this book focuses on Kafka
Streams, it’s part of a larger whole that allows you to develop event-
streaming applications. If this first part leaves you with more questions than
answers, don’t fret; I’ll explain them all in subsequent chapters.
1 Welcome to the Kafka Event
Streaming Platform
This chapter covers
Defining event streaming and events
Introducing the Kafka event streaming platform
Applying the platform to a concrete example
First, the software systems consume and store all the information obtained
from your interaction and the interactions of other subscribers. Then,
additional software systems use that information to make recommendations
to you and to provide the streaming service with insight on what
programming to provide in the future. Now, consider that this process occurs
hundreds of thousands or even millions of times per day, and you can see the
massive amount of information that businesses need to harness and that their
software needs to make sense of to meet customer demands and expectations
and stay competitive.
Processing the event stream in real time is essential for making time-sensitive
decisions. For example, Does this purchase from customer X seem
suspicious? Are the signals from this temperature sensor indicating
something has gone wrong in a manufacturing process? Has the routing
information been sent to the appropriate department of a business?
Since everything in life can be considered an event, any problem domain will
benefit from processing event streams. But there are some areas where it’s
more important to do so. Here are some typical examples
But streaming applications are only a fit for some situations. Event-streaming
applications become necessary when you have data in different places or a
large volume of events requiring distributed data stores. So, if you can
manage with a single database instance, streaming is unnecessary. For
example, a small e-commerce business or a local government website with
primarily static data aren’t good candidates for building an event-streaming
solution.
In this book, you’ll learn about event-stream development, when and why it’s
essential, and how to use the Kafka event-streaming platform to build robust
and responsive applications. You’ll learn how to use the Kafka streaming
platform’s various components to capture events and make them available for
other applications. We’ll cover using the platform’s components for simple
actions such as writing (producing) or reading (consuming) events to
advanced stateful applications requiring complex transformations so you can
solve the appropriate business challenges with an event-streaming approach.
This book is suitable for any developer looking to get into building event-
streaming applications.
Although the title, "Kafka Streams in Action," focuses on Kafka Streams, this
book teaches the entire Kafka event-streaming platform, end to end. That
platform includes crucial components, such as producers, consumers, and
schemas, that you must work with before building your streaming apps,
which you’ll learn in Part 1. As a result, we don’t get into the subject of
Kafka Streams itself until later in the book, in Chapter 6. But the enhanced
coverage is worth it; Kafka Streams is an abstraction built on top of
components of the Kafka event streaming platform, so understanding them
gives you a better grasp of how you can use Kafka Streams.
I’ve used a lot of different terms in this introduction, so let’s wrap this section
up with a table of definitions:
Figure 1.1 A sequence of events comprising an event stream starting with the online purchase of
the flux ch01capacitor
1. You complete the purchase on the retailer’s website, and the site
provides a tracking number.
2. The retailer’s warehouse receives the purchase event information and
puts the Flux Capacitor on a shipping truck, recording the date and time
your purchase left the warehouse.
3. The truck arrives at the airport, and the driver loads the Flux Capacitor
on a plane and scans a barcode recording the date and time.
4. The plane lands, and the package is loaded on a truck again headed for
the regional distribution center. The delivery service records the date
and time they loaded your Flux Capacitor.
5. The truck from the airport arrives at the regional distribution center. A
delivery service employee unloads the Flux Capacitor, scanning the date
and time of the arrival at the distribution center.
6. Another employee takes your Flux Capacitor, scans the package, saves
the date and time, and loads it on a truck bound for delivery to you.
7. The driver arrives at your house, scans the package one last time, and
hands it to you. You can start building your time-traveling car!
From our example here, you can see how everyday actions create events,
hence an event stream. The individual events are the initial purchase, each
time the package changes custody, and the final delivery. This scenario
represents events generated by just one purchase. But if you think of the
event streams generated by purchases from Amazon and the various shippers
of the products, the number of events could easily number in the billions or
trillions.
Figure 1.2 Initial event-streaming architecture leads to complexity as the different departments
and data stream sources need to be aware of the other sources of events
In the above illustration, individual departments create separate
infrastructures to meet their requirements. However, other departments may
be interested in consuming the same data, which leads to a more complicated
architecture to connect the various input streams.
Let’s look at how the Kafka event streaming platform can change things.
Figure 1.3 Using the Kafka event streaming platform, the architecture is simplified
As you can see from this updated illustration, adding the Kafka event
streaming platform dramatically simplifies the architecture. All components
now send their records to Kafka. Additionally, consumers read data from
Kafka with no awareness of the producers.
Figure 1.4 You deploy brokers in a cluster, and brokers replicate data for durable storage
This illustration shows that Kafka brokers are the storage layer within the
Kafka architecture and sit in the "storage" portion of the event-streaming
trilogy. But in addition to acting as the storage layer, the brokers provide
other essential functions such as serving client requests and coordinating with
consumers. We’ll go into details of broker functionality in Chapter 2.
The Producer client is responsible for sending records into Kafka. The
consumer is responsible for reading records from Kafka. These two clients
form the basic building blocks for creating an event-driven application and
are agnostic to each other, allowing for greater scalability. The producer and
consumer client also form the foundation for any higher-level abstraction
working with Apache Kafka. We cover clients in Chapter 4.
Let’s pause our scenario to discuss the relationship between these simple
events and how they interact with the Kafka event streaming platform.
The data generated by the initial clicks to navigate to and print the coupons
create clickstream information captured and produced directly into Kafka
with a producer microservice. The marketing department started a new
campaign and wants to measure its effectiveness, so the clickstream events
available here are valuable.
The first sign of a successful project is that users click on the email links to
retrieve the coupons. Additionally, the data science group is also interested in
the pre-purchase clickstream data. The data science team can track customers'
actions and attribute purchases to those initial clicks and marketing
campaigns. The amount of data from this single activity may seem minor.
You have a significant amount of data when you factor in a large customer
base and several different marketing campaigns.
It’s late summer, and Jane has meant to go shopping to get her children back-
to-school supplies. Since tonight is a rare night with no family activities, Jane
stops off at ZMart on her way home.
Walking through the store after grabbing everything she needs, Jane walks by
the footwear section and notices some new designer shoes that would go
great with her new suit. She realizes that’s not what she came in for, but what
the heck? Life is short (ZMart thrives on impulse purchases!), so Jane gets
the shoes.
As Jane reaches the self-checkout aisle, she scans her ZMart member card.
After scanning all the items, she scans the coupon, which reduces the
purchase by 15%. Then Jane pays for the transaction with her debit card,
takes the receipt, and walks out of the store. A little later that evening, Jane
checked her email, and there was a message from ZMart thanking her for her
patronage with coupons for discounts on a new line of designer clothes.
Let’s dissect the purchase transaction and see if this event triggers a sequence
of operations performed by the Kafka event streaming platform.
So now ZMart’s sales data streams into Kafka. In this case, ZMart uses Kafka
Connect to create a source connector to capture the sales as they occur and
send them to Kafka. The sale transaction brings us to the first requirement:
the protection of customer data. In this case, ZMart uses an SMT or Simple
Message Transform to mask the credit card data as it goes into Kafka.
Figure 1.10 Sending all of the sales data directly into Kafka with connect masking the credit card
numbers as part of the process
As Connect writes records into Kafka, different organizations within ZMart
immediately consume them. The department in charge of promotions created
an application for consuming sales data to assign purchase rewards if they are
a loyalty club member. If the customer reaches a threshold for earning a
bonus, an email with a coupon goes out to the customer.
Figure 1.11 Marketing department application for processing customer points and sending out
earned emails
It’s important to note that ZMart processes sales records immediately after
the sale. So, customers get timely emails with their rewards within a few
minutes of completing their purchases. Acting on the purchase events as they
happen allows ZMart a quick response time to offer customer bonuses.
The Data Science group within ZMart uses the sales data topic as well. The
DS group uses a Kafka Streams application to process the sales data, building
up purchase patterns of what customers in different locations are purchasing
the most. The Kafka Streams application crunches the data in real-time and
sends the results to a sales-trends topic.
Figure 1.12 Kafka Streams application crunching sales data and Kafka Connect exporting the
data for a dashboard application
ZMart uses another Kafka connector to export the sales trends to an external
application that publishes the results in a dashboard. Another group also
consumes from the sales topic to keep track of inventory and order new items
if they drop below a given threshold, signaling the need to order more of that
product.
At this point, you can see how ZMart leverages the Kafka platform. It is
important to remember that with an event streaming approach, ZMart
responds to data as it arrives, allowing them to immediately make quick and
efficient decisions. Also, note how you write into Kafka once, yet multiple
groups consume it at different times, independently so that one group’s
activity doesn’t impede another’s.
1.5 Summary
Event streaming captures events generated from different sources like
mobile devices, customer interaction with websites, online activity,
shipment tracking, and business transactions. Event streaming is
analogous to our nervous system.
An event is "something that happens," and the ability to react
immediately and review later is an essential concept of an event
streaming platform
Kafka acts as a central nervous system for your data and simplifies your
event stream processing architecture
The Kafka event streaming platform provides the core capabilities for
you to implement your event streaming application from end-to-end by
delivering the three main components of publish/consume, durable
storage, and processing.
Kafka brokers are the storage layer and service requests from clients for
writing and reading records. The brokers store records as bytes and do
not touch or alter the contents.
Schema Registry provides a way to ensure compatibility of records
between producers and consumers.
Producer clients write (produce) records to the broker. Consumer clients
consume records from the broker. The producer and consumer clients
are agnostic of each other. Additionally, the Kafka broker doesn’t know
who the individual clients are; they only process the requests.
Kafka Connect provides a mechanism for integrating existing systems,
such as external storage for getting data into and out of Kafka.
Kafka Streams is the native stream processing library for Kafka. It runs
at the perimeter of a Kafka cluster, not inside the brokers, and provides
support for transforming data, including joins and stateful
transformations.
ksqlDB is an event streaming database for Kafka. It allows you to build
robust real-time systems with just a few lines of SQL.
[1] https://www.merriam-webster.com/dictionary/event
2 Kafka Brokers
This chapter covers
Explaining how the Kafka Broker is the storage layer in the Kafka event
streaming platform
Describing how Kafka brokers handle requests from clients for writing
and reading records
Understanding topics and partitions
Using JMX metrics to check for a healthy broker
In describing the broker behavior in this chapter, we’ll get into some lower-
level details. It’s essential to cover them to give you an understanding of how
the broker operates. Additionally, some of the things we’ll cover, such as
topics and partitions, are essential concepts you’ll need to understand when
we get into the client chapter. But as a developer, you won’t have to handle
these topics daily.
As the storage layer, the broker manages data, including retention and
replication. Retention is how long the brokers store records. Replication is
how brokers make copies of the data for durable storage, meaning you won’t
lose data if you lose a machine.
But the broker also handles requests from clients. Here’s an illustration
showing the client applications and the brokers:
Note
Kafka is a deep subject, so I won’t cover every aspect. I’ll review enough
information to get you started working with the Kafka event streaming
platform. For in-depth coverage, look at Kafka in Action by Dylan Scott
(Manning, 2018).
While you’re learning about the Kafka broker, I’ll need to talk about the
producer and consumer clients. But since this chapter is about the broker, I’ll
focus more on the broker’s responsibilities. So, I’ll leave out some of the
client details. But don’t worry; we’ll get to those details in a later chapter.
So, let’s get started with some walkthroughs of how a broker handles client
requests, starting with producing.
Now that we’ve walked through an example produce request, let’s walk
through another request type, fetch, which is the logical opposite of
producing records: consuming records.
It’s also important to note that producers and consumers are unaware of each
other. The broker handles produce and consume requests separately; one has
nothing to do with the other. The example here is simplified to emphasize the
overall action from the broker’s point of view.
1. The consumer sends a fetch request specifying the offset from which it
wants to start reading records. We’ll discuss offsets in more detail later
in the chapter.
2. The broker takes the fetch request out of the request queue
3. Based on the offset and the topic partition in the request, the broker
fetches a batch of records
4. The broker sends the fetched batch of records in the response to the
consumer
Now that we’ve completed a walk through of two common request types,
produce and fetch, I’m sure you noticed a few terms I still need to describe:
topics, partitions, and offsets. Topics, partitions, and offsets are fundamental,
essential concepts in Kafka, so let’s take some time now to explore what they
mean.
Specifically, Kafka brokers use the file system for storage by appending the
incoming records to the end of a file in a topic. A topic represents the
directory’s name containing the file to which the Kafka broker appends the
records.
Note
Kafka receives the key-value pair messages as raw bytes, stores them that
way, and serves the read requests in the same format. The Kafka broker is
unaware of the type of record that it handles. By merely working with raw
bytes, the brokers don’t spend time deserializing or serializing the data,
allowing for higher performance. We’ll see in Chapter 3 how you can ensure
that topics contain the expected byte format when we cover Schema Registry
in Chapter 3.
Topics have partitions, which is a way of further organizing the topic data
into slots or buckets. A partition is an integer starting at 0. So, if a topic has
three partitions, the partitions numbers are 0, 1, and 2. Kafka appends the
partition number to the end of the topic name, creating the same number of
directories as partitions with the form topic-N where the N represents the
partition number.
Kafka brokers have a configuration, log.dirs, where you place the top-level
directory’s name, which will contain all topic-partition directories. Let’s take
a look at an example. We will assume you’ve configured log.dirs with the
value /var/kafka/topic-data, and you have a topic named purchases with
three partitions.
/var/kafka/topic-data/purchases-0
├── 00000000000000000000.index
├── 00000000000000000000.log
├── 00000000000000000000.timeindex
└── leader-epoch-checkpoint
/var/kafka/topic-data/purchases-1
├── 00000000000000000000.index
├── 00000000000000000000.log
├── 00000000000000000000.timeindex
└── leader-epoch-checkpoint
/var/kafka/topic-data/purchases-2
├── 00000000000000000000.index
├── 00000000000000000000.log
├── 00000000000000000000.timeindex
└── leader-epoch-checkpoint
As you can see here, the topic purchases with three partitions ends up as
three directories, purchases-0, purchases-1, and purchases-2 on the file
system. The topic name is more of a logical grouping, while the partition is
the storage unit.
Tip
The directory structure shown here was generated using the tree command, a
small command line tool used to display all contents of a directory.
While we’ll want to discuss those directories' contents, we still have some
details about topic partitions to cover.
Topic partitions are the unit of parallelism in Kafka. For the most part, the
higher the number of partitions, the higher your throughput. As the primary
storage mechanism, topic partitions allow for the spreading of messages
across several machines. The given topic’s capacity isn’t limited to the
available disk space on a single broker. Also, as mentioned before,
replicating data across several brokers ensures you won’t lose data should a
broker lose disks or die.
Later in this chapter, we’ll discuss load distribution more when discussing
replication, leaders, and followers. We’ll also cover a new feature, tiered
storage, where data is seamlessly moved to external storage, providing
virtually limitless capacity later in the chapter.
So, how does Kafka map records to partitions? The producer client
determines the topic and partition for the record before sending it to the
broker. Once the broker processes the record, it appends it to a file in the
corresponding topic-partition directory.
There are three possible ways of setting the partition for a record:
1. Kafka works with records in key-value pairs. Suppose the key is non-
null (keys are optional). In that case, the producer maps the record to a
partition using the deterministic formula of taking the hash of the key
modulo the number of partitions. This approach means that records with
identical keys always land on the same partition.
2. When building the ProducerRecord in your application, you can
explicitly set the partition for that record, which the producer then uses
before sending it.
3. If the message has no key or partition specified, then partitions are
alternated per batch. I’ll detail how Kafka handles records without keys
and partition assignments in chapter four.
Now that we’ve covered how topic partitions work let’s revisit that Kafka
always appends records to the end of the file. I’m sure you noticed the files in
the directory example with an extension of .log (we’ll talk about how Kafka
names this file in an upcoming section). But these log files aren’t the type
developers think of, where an application prints its status or execution steps.
The term log here is a transaction log, storing a sequence of events in the
order of occurrence. So, each topic partition directory contains its transaction
log. At this point, asking a question about log file growth would be fair.
We’ll discuss log file size and management when we cover segments later in
this chapter.
2.3.1 Offsets
Figure 2.5 Offsets indicate where a consumer has left off reading records
In the illustration, if a consumer reads records with offsets 0-5, the broker
only fetches records starting at offset 6 in the following consumer request.
The offsets used are unique for each consumer and are stored in an internal
topic named {underscore}consumer{underscore}offsets. We’ll go into
more detail about consumers and offsets in chapter four.
Now that we’ve covered topics, partitions, and offsets, let’s quickly discuss
some trade-offs regarding the number of partitions to use.
Here are some things to consider when setting the number of partitions. You
want to choose a high enough number to cover high-throughput situations,
but not so high that you hit limits for the number of partitions a broker can
handle as you create more and more topics. A good starting point could be
the number 30, which is evenly divisible by several numbers, which results in
a more even distribution of keys in the processing layer. [2] We’ll talk more
about the importance of key distribution in later chapters on clients and
Kafka Streams.
At this point, you’ve learned that the broker handles client requests and is the
storage layer for the Kafka event streaming platform. You’ve also learned
about topics, partitions, and their role in the storage layer.
Your next step is to get your hands dirty, producing and consuming records
to see these concepts in action.
Note
We’ll cover the producer and consumer clients in Chapter 4. Console clients
are helpful for learning, quick prototypes, and debugging. But in practice,
you’ll use the clients in your code.
Let’s start working with a Kafka broker by producing and consuming some
records.
Tip
Starting docker-compose with the `-d' flag runs the docker services in the
background. While it’s OK to start docker-compose without the `-d' flag, the
containers print their output to the terminal, so you need to open a new
terminal window to do any further operations.
Wait a few seconds, then run this command to open a shell on the docker
broker container: docker-compose exec broker bash.
Using the docker broker container shell you just opened up, run this
command to create a topic:
kafka-topics --create --topic first-topic\
--bootstrap-server localhost:9092\ #1
--replication-factor 1\ #2
--partitions 1 #3
Important
Since you’re running a local broker for testing, you don’t need a replication
factor greater than 1. The same thing goes for the number of partitions; at this
point, you only need one partition for this local development.
When using the console producer, you need to specify if you will provide
keys. Although Kafka works with key-value pairs, the key is optional and can
be null. Since the key and value go on the same line, you must specify how
Kafka can parse the key and value by providing a delimiter.
After you enter the above command and hit enter, you should see a prompt
waiting for your input. Enter some text like the following:
key:my first message
key:is something
key:very simple
Other documents randomly have
different content
The Project Gutenberg eBook of Selling Latin
America: A Problem in International
Salesmanship.
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Language: English
BY
BOSTON
SMALL, MAYNARD & COMPANY
PUBLISHERS
Copyright, 1915
(INCORPORATED)
Printers
S. J. Parkhill & Co., Boston, U.S.A.
FOREWORD
CHAPTER PAGE
II Brazil 13
III Argentine 31
IV Uruguay 49
V Paraguay 57
VI Chile 67
VII Bolivia 79
VIII Peru 91
IX Ecuador 106
X Colombia 114
XI Venezuela 126
Appendix 377
Index 401
ILLUSTRATIONS
PAGE
Valparaiso 68
MAPS
Mexico 156
ebookgate.com