0% found this document useful (0 votes)
3 views76 pages

31535

The document provides information about the second edition of 'Kafka Streams in Action' by Bill Bejeck, which teaches developers how to build event streaming applications using Kafka Streams and related components. It outlines the book's structure, including its three parts that cover the Kafka ecosystem, data management, and application development, along with practical coding examples and resources. Additionally, it emphasizes the importance of understanding Kafka's broader context for effective application development and offers insights into the author's motivations for updating the content.

Uploaded by

kadrygyorisr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views76 pages

31535

The document provides information about the second edition of 'Kafka Streams in Action' by Bill Bejeck, which teaches developers how to build event streaming applications using Kafka Streams and related components. It outlines the book's structure, including its three parts that cover the Kafka ecosystem, data management, and application development, along with practical coding examples and resources. Additionally, it emphasizes the importance of understanding Kafka's broader context for effective application development and offers insights into the author's motivations for updating the content.

Uploaded by

kadrygyorisr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Get the full ebook with Bonus Features for a Better Reading Experience on ebookgate.

com

Kafka Streams in Action Second Edition MEAP V13


Bill Bejeck

https://ebookgate.com/product/kafka-streams-in-action-
second-edition-meap-v13-bill-bejeck/

OR CLICK HERE

DOWLOAD NOW

Download more ebook instantly today at https://ebookgate.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

ScyllaDB in Action MEAP V05 Bo Ingram

https://ebookgate.com/product/scylladb-in-action-meap-v05-bo-ingram/

ebookgate.com

DuckDB in Action MEAP V02 Mark Needham

https://ebookgate.com/product/duckdb-in-action-meap-v02-mark-needham/

ebookgate.com

C Concurrency in Action Practical Multithreading MEAP 1st


Edition Anthony Williams

https://ebookgate.com/product/c-concurrency-in-action-practical-
multithreading-meap-1st-edition-anthony-williams/

ebookgate.com

Bayesian Optimization in Action MEAP V07 1st / chapters 1


to 8 of 13 Edition Quan Nguyen

https://ebookgate.com/product/bayesian-optimization-in-action-
meap-v07-1st-chapters-1-to-8-of-13-edition-quan-nguyen/

ebookgate.com
Kafka translated how translators have shaped our reading
of Kafka 1. publ Edition Kafka

https://ebookgate.com/product/kafka-translated-how-translators-have-
shaped-our-reading-of-kafka-1-publ-edition-kafka/

ebookgate.com

Google Anthos in Action Manage hybrid and multi cloud


Kubernetes clusters MEAP V11 Antonio Gulli Et Al.

https://ebookgate.com/product/google-anthos-in-action-manage-hybrid-
and-multi-cloud-kubernetes-clusters-meap-v11-antonio-gulli-et-al/

ebookgate.com

Deep Learning with PyTorch Second Edition MEAP V03 Howard


Huang

https://ebookgate.com/product/deep-learning-with-pytorch-second-
edition-meap-v03-howard-huang/

ebookgate.com

EJB 3 in Action Second Edition Debu Panda

https://ebookgate.com/product/ejb-3-in-action-second-edition-debu-
panda/

ebookgate.com

Introduction to Documentary Second Edition Bill Nichols

https://ebookgate.com/product/introduction-to-documentary-second-
edition-bill-nichols/

ebookgate.com
Kafka Streams in Action, Second Edition
1. About_this_book
2. Acknowledgements
3. Preface
4. PART_1:_INTRODUCTION
5. 1_Welcome_to_the_Kafka_Event_Streaming_Platform
6. 2_Kafka_Brokers
7. PART_2:_GETTING_DATA_INTO_KAFKA
8. 3_Schema_Registry
9. 4_Kafka_Clients
10. 5_Kafka_Connect
11. PART_3:_EVENT_STREAM_PROCESSING_DEVELOPMENT
12. 6_Developing_Kafka_Streams
13. 7_Streams_and_State
14. 8_The_KTable_API
15. 9_Windowing_and_Timestamps
16. 10_The_Processor_API
17. 11_ksqlDB
18. 12_Spring_Kafka
19. 13_Kafka_Streams_interactive_queries
20. 14_Testing
21. Appendix_A._Schema_Compatibility_Workshop
22. Appendix_B._Working_with_Avro,_Protobuf_and_JSON_Schema
23. Appendix_C._Understanding_Kafka_Streams_architecture
24. Appendix_D._Confluent_Resources
25. index
About this book
I wrote the 2nd edition of Kafka Streams in Action to teach you how to build
event streaming applications in Kafka Streams and include other components
of the Kafka ecosystem, Producer and Consumer clients, Connect, and
Schema Registry. I took this approach because for your event-streaming
application to be as effective as possible, you’ll need not just Kafka Streams
but other essential tools. My approach to writing this book is a pair-
programming perspective; I imagine myself sitting next to you as you write
the code and learn the API. You’ll learn about the Kafka broker and how the
producer and consumer clients work. Then, you’ll see how to manage
schemas, their role with Schema Registry, and how Kafka Connect bridges
external components and Kafka. From there, you’ll dive into Kafka Streams,
first building a simple application, then adding more complexity as you dig
deeper into Kafka Streams API. You’ll also learn about ksqlDB, testing, and,
finally, integrating Kafka with the popular Spring framework.

Who should read this book


Kafka Streams in Action is for any developer wishing to get into stream
processing. While not strictly required, knowledge of distributed
programming will help understand Kafka and Kafka Streams. Knowledge of
Kafka is beneficial but not required; I’ll teach you what you need to know.
Experienced Kafka developers and those new to Kafka will learn how to
develop compelling stream-processing applications with Kafka Streams.
Intermediate-to-advanced Java developers familiar with topics like
serialization will learn how to use their skills to build a Kafka Streams
application. The book’s source code is written in Java 17 and extensively
uses Java lambda syntax, so experience with lambdas (even from another
language) will be helpful.
How this book is organized: a roadmap
This book has three parts spread over 14 chapters. While the book’s title is
“Kafka Streams in Action”, it covers the entire Kafka event streaming
platform. As a result, the first five chapters cover the different components:
Kafka brokers, consumer and producer clients, Schema Registry, and Kafka
Connect. This approach makes sense, especially considering that Kafka
Streams is an abstraction over the consumer and producer clients. So, if
you’re already familiar with Kafka, Connect, and Schema Registry or if
you’re excited to get going with Kafka Streams, then by all means, skip
directly to Part 3.

Part 1 introduces event streaming and describes the different parts of the
Kafka ecosystem to show you the big-picture view of how it all works and
fits together. These chapters also provide the basics of the Kafka broker for
those who need them or want a review:

1. Chapter 1 provides some context on what is an event and event-


streaming and why it’s vital for working with real-time data. It also
presents the mental model of the different components we’ll cover: the
broker, clients, Kafka Connect, Schema Registry, and, of course, Kafka
Streams. I don’t go over any code but describe how they all work.
2. Chapter 2 is a primer for developers who are new to Kafka, and it covers
the role of the broker, topics, partitions, and some monitoring. Those
with more experience with Kafka can skip this chapter.

Part 2 moves on and covers getting data into and out of Kafka and managing
schemas: . Chapter 3 covers using Schema Registry to help you manage the
evolution of your data’s schemas. Spoiler alert: you’re always using a
schema-if not explicitly, then it’s implicitly there. . Chapter 4 discusses the
Kafka producer and consumer clients. The clients are how you get data into
and out of Kafka and provide the building blocks for Kafka Connect and
Kafka Streams. . Chapter 5 is about Kafka Connect. Kafka Connect provides
the ability to get data into Kafka via source connectors and export it to
external systems with sink connectors.
Part 3 gets to the book’s heart and covers developing Kafka Streams
applications. In this section, you’ll also learn about ksqlDB and testing your
event-streaming application, and it concludes with integrating Kafka with the
Spring Framework . Chapter 6 is your introduction to Kafka Streams, where
you’ll build a Hello World application and, from there, build a more realistic
application for a fictional retailer. Along the way, you’ll learn about the
Kafka Streams DSL. . Chapter 7 continues your Kafka Streams learning path,
where we discuss application state and why it’s required for streaming
applications. In this chapter, some of the things you’ll learn about are
aggregating data and joins. . Chapter 8: You’ll learn about the KTable API.
Whereas a KStream is a stream of events, a KTable is a stream of related
events or an update stream. . Chapter 9 covers windowed operations and
timestamps. Windowing an aggregation allows you to bucket results by time,
and the timestamps on the records drive the action. . Chapter 10 dives into the
Kafka Streams Processor API. Up to this point, you’ve been working with the
high-level DSL, but here, you’ll learn how to use the Processor API when
you need more control. . Chapter 11 takes you further into the development
stack, where you’ll learn about ksqlDB. ksqlDB allows you to write event-
streaming applications without any code but using SQL. . Chapter 12
discusses using the Spring Framework with Kafka clients and Kafka Streams.
Spring allows you to write more modular and testable code by providing a
dependency injection framework for wiring up your applications. . Chapter
13 introduces you to Kafka Streams Interactive Queries or IQ. IQ is the
ability to directly query the state store of a state operation in Kafka Streams.
You’ll use what you learned in Chapter 12 to build a Spring-enabled IQ web
application. . Chapter 14 covers the all-important topic of testing. You’ll
learn how to test client applications with a Kafka Streams topology, the
difference between unit testing and integration testing, and when to apply
them. . Appendix A contains a workshop on Schema Registry to get hands-on
experience with the different schema compatibility modes. . Appendix B is a
survey of working with the different schema types Avro, Protobuf, and JSON
Schema. . Appendix C covers the architecture and internals of Kafka Streams.
. Appendix D presents information on using Confluent Cloud to help develop
your event streaming applications.

About the code


This book contains many examples of source code both in numbered listings
andinline with normal text. In both cases, source code is formatted in a
fixed-width font like this to separate it from ordinary text.

In many cases, the original source code has been reformatted; we’ve added
linebreaks and reworked indentation to accommodate the available page
space in the book. In rare cases, even this was not enough, and listings
include line-continuationmarkers (➥).

Additionally, comments in the source code have often been removed from the
list-ings when the code is described in the text. Code annotations accompany
many of thelistings, highlighting important concepts.

Finally, it’s important to note that many of the code examples aren’t meant
tostand on their own: they’re excerpts containing only the most relevant parts
of what is currently under discussion. You’ll find all the examples from the
book in the accompanying source code in their complete form.

Source code for the book’s examples is available from GitHub at


https://github.com/bbejeck/KafkaStreamsInAction2ndEdition and the
publisher’s website at www.manning.com/books/kafka-streams-in-action-
second-edition. The source code for the book is an all-encompassing project
using the build tool Gradle (https://gradle.org). You can import the project
into either IntelliJ or Eclipse using the appropriate commands. Full
instructions for using and navigating the sourcecode can be found in the
accompanying README.md file.

Other online resources


1. Apache Kafka documentation: https://kafka.apache.org
2. Confluent documentation: https://docs.confluent.io/current
3. Kafka Streams documentation:
https://docs.confluent.io/current/streams/index.html#kafka-streams
4. ksqlDB documentation: https://ksqldb.io/
5. Spring Framework: https://spring.io/
Acknowledgements
I want to thank my wife, Beth, for supporting my signing up for a second
edition. Writing the first edition of a book is very time-consuming, so you’d
think the second edition would be more straightforward, just making
adjustments for API changes. But in this case, I wanted more from my
previous work and decided to do an entire rewrite. Beth never questioned my
decision and fully supported my new direction, and as before, I couldn’t have
completed this without her support. Beth, you are fantastic, and I’m very
grateful to have you as my wife. I’d also like to thank my three children for
having great attitudes and supporting me in doing a second edition.

Next, I thank my editor at Manning, Frances Lefkowitz, whose continued


expert guidance and patience made the writing process fun this time. I also
thank John Guthrie for his excellent, precise technical feedback and Karsten
Strøbæk, the technical proofer, for his superb work reviewing the code. I’d
also like to thank the Kafka Streams developers and community for being so
engaging and brilliant in making Kafka Streams the best stream processing
library available. I want to acknowledge all the Kafka developers for building
such high-quality software, especially Jay Kreps, Neha Narkhede, and Jun
Rao, not only for starting Kafka in the first place but for creating such a great
place to work in Confluent.

Last but certainly not least, I thank the reviewers for their hard work and
invaluable feedback in making the quality of this book better for all readers.
Preface
After completing the first edition of Kafka Streams in Action, I thought that I
had accomplished everything I had set out to do. But as time went on, my
understanding of the Kafka ecosystem and my appreciation for Kafka
Streams grew. I saw that Kafka Streams was more powerful than I had
initially thought. Additionally, I noticed other important pieces in building
event-streaming applications; Kafka Streams is still a key player but not the
only requirement. I realized that Apache Kafka could be considered the
central nervous system for an organization’s data. If Kafka is the central
nervous system, then Kafka Streams is a vital organ performing some
necessary operations.

But Kafka Streams relies on other components to bring events into Kafka or
export them to the outside world where its results and calculations can be put
to good use. I’m talking about the producer and consumer clients and Kafka
Connect. As I put the pieces together, I realized you need these other
components to complete the event-streaming picture. Couple all this with
some significant improvements to Kafka Streams since 2018, and I knew I
wanted to write a second edition.

But I didn’t just want to brush up on the previous edition; I wanted to express
my improved understanding and add complete coverage of the entire Kafka
ecosystem. This meant expanding the scope of some subjects from sections
of chapters to whole chapters (like the producer and consumer clients), or
adding entirely new chapters (such as the new chapters on Connect and
Schema Registry). For the existing Kafka Streams chapters, writing a second
edition meant updating and improving the existing material to clarify and
communicate my deeper understanding.

Taking on the second edition with this new focus during the pandemic was
not easy and not without some serious personal challenges along the way. But
in the end, it was worth every minute of it, and if I were to go back in time, I
would make the same decision. I hope that new readers of Kafka Streams in
Action will find the book an essential resource and that readers from the first
edition will enjoy and apply the improvements as well.
PART 1: INTRODUCTION
In part one, you’ll learn about events and event streaming in general. Event
streaming is a software development approach that considers events as an
application’s primary input and output. But to develop an effective event
streaming application, you’ll first need to learn what an event is (spoiler alert:
it’s everything!). Then you’ll read about what use cases are good candidates
for event-streaming applications and which are not.

First, you’ll discover what a Kafka broker is and how it’s at the heart of the
Kafka ecosystem, and the various jobs it performs. Then you’ll learn what
Schema Registry, producer and consumer clients, Connect, and Kafka
Streams are and their different roles. Then you’ll learn about the Apache
Kafka event streaming platform; although this book focuses on Kafka
Streams, it’s part of a larger whole that allows you to develop event-
streaming applications. If this first part leaves you with more questions than
answers, don’t fret; I’ll explain them all in subsequent chapters.
1 Welcome to the Kafka Event
Streaming Platform
This chapter covers
Defining event streaming and events
Introducing the Kafka event streaming platform
Applying the platform to a concrete example

While the constant influx of data creates more entertainment and


opportunities for the consumer, increasingly, the users of this information are
software systems using other software systems. Think, for example, of the
fundamental interaction of watching a movie from your favorite movie
streaming application. You log into the application, search for and select a
film, then watch it, and afterward, you may provide a rating or some
indication of how you enjoyed the movie. Just this simple interaction
generates several events captured by the movie streaming service. But this
information needs analysis if it’s to be of use to the business. That’s where all
the other software comes into play.

First, the software systems consume and store all the information obtained
from your interaction and the interactions of other subscribers. Then,
additional software systems use that information to make recommendations
to you and to provide the streaming service with insight on what
programming to provide in the future. Now, consider that this process occurs
hundreds of thousands or even millions of times per day, and you can see the
massive amount of information that businesses need to harness and that their
software needs to make sense of to meet customer demands and expectations
and stay competitive.

Another way to think of this process is that everything modern-day


consumers do, from streaming a movie online to purchasing a pair of shoes at
a brick-and-mortar store, generates an event. For an organization to survive
and excel in our digital economy, it must have an efficient way of capturing
and acting on these events. In other words, businesses must find ways to keep
up with the demand of this endless flow of events if they want to satisfy
customers and maintain a robust bottom line. Developers call this constant
flow an event stream. And, increasingly, they are meeting the demands of this
endless digital activity with an event-streaming platform, which utilizes a
series of event-streaming applications.

An event-streaming platform is analogous to our central nervous system,


which processes millions of events (nerve signals) and, in response, sends out
messages to the appropriate parts of the body. Our conscious thoughts and
actions generate some of these responses. When we are hungry and open the
refrigerator, the central nervous system gets the message and sends out
another one, telling the arm to reach for a nice red apple on the first shelf.
Other actions, such as your heart rate increasing in anticipation of exciting
news, are handled unconsciously.

An event-streaming platform captures events generated from mobile devices,


customer interaction with websites, online activity, shipment tracking, and
other business transactions. But the platform, like the nervous system, does
more than capture events. It also needs a mechanism to reliably transfer and
store the information from those events in the order in which they occurred.
Then, other applications can process or analyze the events to extract different
bits of that information.

Processing the event stream in real time is essential for making time-sensitive
decisions. For example, Does this purchase from customer X seem
suspicious? Are the signals from this temperature sensor indicating
something has gone wrong in a manufacturing process? Has the routing
information been sent to the appropriate department of a business?

But the value of an event-streaming platform goes beyond gaining immediate


information. Providing durable storage allows us to go back and look at
event-stream data in its raw form, perform some manipulation of the data for
more insight, or replay a sequence of events to try and understand what led to
a particular outcome. For example, an e-commerce site offers a fantastic deal
on several products on the weekend after a big holiday. The response to the
sale is so strong that it crashes a few servers and brings the business down for
a few minutes. By replaying all customer events, engineers can better
understand what caused the breakdown and how to fix the system so it can
handle a large, sudden influx of activity.

So, where do you need event-streaming applications?

Since everything in life can be considered an event, any problem domain will
benefit from processing event streams. But there are some areas where it’s
more important to do so. Here are some typical examples

Credit card fraud — A credit card owner may be unaware of


unauthorized use. By reviewing purchases as they happen against
established patterns (location, general spending habits), you may be able
to detect a stolen credit card and alert the owner.
Intrusion detection — The ability to monitor aberrant behavior in real-
time is critical for the protection of sensitive data and the well-being of
an organization.
The Internet of Things - With IoT, sensors are located in all kinds of
places and send back data frequently. The ability to quickly capture and
process this data meaningfully is essential; anything less diminishes the
effect of deploying these sensors.
The financial industry — The ability to track market prices and direction
in real-time is essential for brokers and consumers to make effective
decisions about when to sell or buy.
Sharing data in real-time - Large organizations, like corporations or
conglomerates, that have many applications need to share data in a
standard, accurate, and real-time way

Bottom line: If the event stream provides essential and actionable


information, businesses and organizations need event-driven applications to
capitalize on the information provided.

But streaming applications are only a fit for some situations. Event-streaming
applications become necessary when you have data in different places or a
large volume of events requiring distributed data stores. So, if you can
manage with a single database instance, streaming is unnecessary. For
example, a small e-commerce business or a local government website with
primarily static data aren’t good candidates for building an event-streaming
solution.
In this book, you’ll learn about event-stream development, when and why it’s
essential, and how to use the Kafka event-streaming platform to build robust
and responsive applications. You’ll learn how to use the Kafka streaming
platform’s various components to capture events and make them available for
other applications. We’ll cover using the platform’s components for simple
actions such as writing (producing) or reading (consuming) events to
advanced stateful applications requiring complex transformations so you can
solve the appropriate business challenges with an event-streaming approach.
This book is suitable for any developer looking to get into building event-
streaming applications.

Although the title, "Kafka Streams in Action," focuses on Kafka Streams, this
book teaches the entire Kafka event-streaming platform, end to end. That
platform includes crucial components, such as producers, consumers, and
schemas, that you must work with before building your streaming apps,
which you’ll learn in Part 1. As a result, we don’t get into the subject of
Kafka Streams itself until later in the book, in Chapter 6. But the enhanced
coverage is worth it; Kafka Streams is an abstraction built on top of
components of the Kafka event streaming platform, so understanding them
gives you a better grasp of how you can use Kafka Streams.

1.1 What is an event ?


So we’ve defined an event stream, but what is an event? We’ll define an
event simply as "something that happens"[1]. While the term event probably
brings to mind something notable happening, like the birth of a child, or a
wedding or sporting event, we’re going to focus on smaller, more constant
events like a customer making a purchase (online or in-person), or clicking a
link on a web-page, or a sensor transmitting data. Either people or machines
can generate events. It’s the sequence of events and the constant flow of them
that make up an event stream.

Events conceptually contain three main components:

1. Key - an identifier for the event


2. Value - the event itself
3. timestamp - when the event occurred
Let’s discuss each of these parts of an event in more detail. The key could be
an identifier for the event, and as we’ll learn in later chapters, it plays a role
in routing and grouping events. Think of an online purchase, and using the
customer ID is an excellent example of the key. The value is the event
payload itself. The event value could be a trigger, such as activating a sensor
when someone opens a door or a result of some action like the item
purchased in the online sale. Finally, the timestamp is the date-time when
recording when the event occurred. As we go through the various chapters in
this book, we’ll encounter all three components of this "event trinity"
regularly.

I’ve used a lot of different terms in this introduction, so let’s wrap this section
up with a table of definitions:

Something that occurs and attributes


Event
about it recorded

A series of events captured in real-


Event Stream time from sources such as mobile or
IoT devices

Software to handle event streams -


capable of producing, consuming,
Event Streaming Platform
processing, and storage of event
streams

The premier event streaming


platform - it provides all the
Apache Kafka
components of an event streaming
platform in one battle-tested solution

The native event stream processing


Kafka Streams library for Kafka

1.2 An event stream example


Let’s say you’ve purchased a Flux Capacitor and are excited to receive your
new purchase. Let’s walk through the events leading up to the time you get
your brand new Flux Capacitor, using the following illustration as your
guide.

Figure 1.1 A sequence of events comprising an event stream starting with the online purchase of
the flux ch01capacitor
1. You complete the purchase on the retailer’s website, and the site
provides a tracking number.
2. The retailer’s warehouse receives the purchase event information and
puts the Flux Capacitor on a shipping truck, recording the date and time
your purchase left the warehouse.
3. The truck arrives at the airport, and the driver loads the Flux Capacitor
on a plane and scans a barcode recording the date and time.
4. The plane lands, and the package is loaded on a truck again headed for
the regional distribution center. The delivery service records the date
and time they loaded your Flux Capacitor.
5. The truck from the airport arrives at the regional distribution center. A
delivery service employee unloads the Flux Capacitor, scanning the date
and time of the arrival at the distribution center.
6. Another employee takes your Flux Capacitor, scans the package, saves
the date and time, and loads it on a truck bound for delivery to you.
7. The driver arrives at your house, scans the package one last time, and
hands it to you. You can start building your time-traveling car!

From our example here, you can see how everyday actions create events,
hence an event stream. The individual events are the initial purchase, each
time the package changes custody, and the final delivery. This scenario
represents events generated by just one purchase. But if you think of the
event streams generated by purchases from Amazon and the various shippers
of the products, the number of events could easily number in the billions or
trillions.

1.3 Introducing the Apache Kafka® event


streaming platform
The Kafka event streaming platform provides the core capabilities to
implement your event streaming application from end-to-end. We can break
down these capabilities into three main areas: publishing/consuming, durable
storage, and processing. This move, store, and process trilogy enables Kafka
to operate as the central nervous system for your data.

Before we go on, it will be helpful to illustrate what it means for Kafka to be


the central nervous system for your data. We’ll do this by showing before and
after illustrations.

Let’s first look at an event-streaming solution where each input source


requires separate infrastructure:

Figure 1.2 Initial event-streaming architecture leads to complexity as the different departments
and data stream sources need to be aware of the other sources of events
In the above illustration, individual departments create separate
infrastructures to meet their requirements. However, other departments may
be interested in consuming the same data, which leads to a more complicated
architecture to connect the various input streams.

Let’s look at how the Kafka event streaming platform can change things.

Figure 1.3 Using the Kafka event streaming platform, the architecture is simplified
As you can see from this updated illustration, adding the Kafka event
streaming platform dramatically simplifies the architecture. All components
now send their records to Kafka. Additionally, consumers read data from
Kafka with no awareness of the producers.

At a high level, Kafka is a distributed system of servers and clients. The


servers are called brokers; the clients are record producers sending records to
the brokers, and the consumer clients read records for the processing of
events.
1.3.1 Kafka brokers

Kafka brokers durably store your records in contrast with traditional


messaging systems (RabbitMQ or ActiveMQ), where the messages are
ephemeral. The brokers store the data agnosticically as the key-value pairs
(and some other metadata fields) in byte format and are somewhat of a black
box to the broker.

Preserving events has more profound implications concerning the difference


between messages and events. You can think of messages as "tactical"
communication between two machines, while events represent business-
critical data you don’t want to throw away.

Figure 1.4 You deploy brokers in a cluster, and brokers replicate data for durable storage
This illustration shows that Kafka brokers are the storage layer within the
Kafka architecture and sit in the "storage" portion of the event-streaming
trilogy. But in addition to acting as the storage layer, the brokers provide
other essential functions such as serving client requests and coordinating with
consumers. We’ll go into details of broker functionality in Chapter 2.

1.3.2 Schema registry


Figure 1.5 Schema registry enforces data modeling across the platform
Data governance is vital, to begin with, and its importance only increases as
the size and diversity of an organization grows. Schema Registry stores
schemas of the event records. Schemas enforce a contract for data between
producers and consumers. Schema Registry also provides serializers and
deserializers supporting different tools that are Schema Registry aware.
Providing (de)serializers means you don’t have to write your serialization
code. We’ll cover Schema Registry in Chapter 3.

1.3.3 Producer and consumer clients


Figure 1.6 producers write records into Kafka, and consumers read records

The Producer client is responsible for sending records into Kafka. The
consumer is responsible for reading records from Kafka. These two clients
form the basic building blocks for creating an event-driven application and
are agnostic to each other, allowing for greater scalability. The producer and
consumer client also form the foundation for any higher-level abstraction
working with Apache Kafka. We cover clients in Chapter 4.

1.3.4 Kafka Connect


Figure 1.7 Kafka Connect bridges the gap between external systems and Apache Kafka

Kafka Connect provides an abstraction over the producer and consumer


clients for importing data to and exporting data from Apache Kafka. Kafka
connect is essential in connecting external data stores with Apache Kafka. It
also provides an opportunity to perform lightweight data transformations with
Simple Messages Transforms when exporting or importing data. We’ll go
into details of Kafka Connect in a later chapter.

1.3.5 Kafka Streams


Figure 1.8 Kafka Streams is the stream processing API for Kafka

Kafka Streams is Kafka’s native stream processing library. Kafka Streams is


written in the Java programming language and is used by client applications
at the perimeter of a Kafka cluster; it is *not* run inside a Kafka broker. It
supports performing operations on event data, including transformations and
stateful operations like joins and aggregations. Kafka Streams is where you’ll
do the heart of your work when dealing with events—chapters 6, 7, 8, 9, and
10 cover Kafka Streams in detail.
1.3.6 ksqlDB

ksqlDB is an event streaming database. It does this by applying an SQL


interface for event stream processing. Under the covers, ksqlDB uses Kafka
Streams to perform its event streaming tasks. A key advantage of ksqlDB is
that it allows you to specify your event streaming operation in SQL; no code
is required. We’ll discuss ksqlDB in chapter 11.

Figure 1.9 ksqlDB provides streaming database capabilities


Now that we’ve gone over how the Kafka event streaming platform works,
including the individual components, let’s apply a concrete example of a
retail operation demonstrating how the Kafka event streaming platform
works.

1.4 A concrete example of applying the Kafka event


streaming platform
Let’s say there is a consumer named Jane Doe, and she checks her email.
There’s one email from ZMart with a link to a page on the ZMart website
containing coupons for 15% off the total purchase price. Once on the web
page, Jane clicks another link to activate and print the coupons. While this
whole sequence is just another online purchase for Jane, it represents
clickstream events for ZMart.

Let’s pause our scenario to discuss the relationship between these simple
events and how they interact with the Kafka event streaming platform.

The data generated by the initial clicks to navigate to and print the coupons
create clickstream information captured and produced directly into Kafka
with a producer microservice. The marketing department started a new
campaign and wants to measure its effectiveness, so the clickstream events
available here are valuable.

The first sign of a successful project is that users click on the email links to
retrieve the coupons. Additionally, the data science group is also interested in
the pre-purchase clickstream data. The data science team can track customers'
actions and attribute purchases to those initial clicks and marketing
campaigns. The amount of data from this single activity may seem minor.
You have a significant amount of data when you factor in a large customer
base and several different marketing campaigns.

Now, let’s resume our shopping example.

It’s late summer, and Jane has meant to go shopping to get her children back-
to-school supplies. Since tonight is a rare night with no family activities, Jane
stops off at ZMart on her way home.
Walking through the store after grabbing everything she needs, Jane walks by
the footwear section and notices some new designer shoes that would go
great with her new suit. She realizes that’s not what she came in for, but what
the heck? Life is short (ZMart thrives on impulse purchases!), so Jane gets
the shoes.

As Jane reaches the self-checkout aisle, she scans her ZMart member card.
After scanning all the items, she scans the coupon, which reduces the
purchase by 15%. Then Jane pays for the transaction with her debit card,
takes the receipt, and walks out of the store. A little later that evening, Jane
checked her email, and there was a message from ZMart thanking her for her
patronage with coupons for discounts on a new line of designer clothes.

Let’s dissect the purchase transaction and see if this event triggers a sequence
of operations performed by the Kafka event streaming platform.

So now ZMart’s sales data streams into Kafka. In this case, ZMart uses Kafka
Connect to create a source connector to capture the sales as they occur and
send them to Kafka. The sale transaction brings us to the first requirement:
the protection of customer data. In this case, ZMart uses an SMT or Simple
Message Transform to mask the credit card data as it goes into Kafka.

Figure 1.10 Sending all of the sales data directly into Kafka with connect masking the credit card
numbers as part of the process
As Connect writes records into Kafka, different organizations within ZMart
immediately consume them. The department in charge of promotions created
an application for consuming sales data to assign purchase rewards if they are
a loyalty club member. If the customer reaches a threshold for earning a
bonus, an email with a coupon goes out to the customer.

Figure 1.11 Marketing department application for processing customer points and sending out
earned emails
It’s important to note that ZMart processes sales records immediately after
the sale. So, customers get timely emails with their rewards within a few
minutes of completing their purchases. Acting on the purchase events as they
happen allows ZMart a quick response time to offer customer bonuses.

The Data Science group within ZMart uses the sales data topic as well. The
DS group uses a Kafka Streams application to process the sales data, building
up purchase patterns of what customers in different locations are purchasing
the most. The Kafka Streams application crunches the data in real-time and
sends the results to a sales-trends topic.
Figure 1.12 Kafka Streams application crunching sales data and Kafka Connect exporting the
data for a dashboard application

ZMart uses another Kafka connector to export the sales trends to an external
application that publishes the results in a dashboard. Another group also
consumes from the sales topic to keep track of inventory and order new items
if they drop below a given threshold, signaling the need to order more of that
product.

At this point, you can see how ZMart leverages the Kafka platform. It is
important to remember that with an event streaming approach, ZMart
responds to data as it arrives, allowing them to immediately make quick and
efficient decisions. Also, note how you write into Kafka once, yet multiple
groups consume it at different times, independently so that one group’s
activity doesn’t impede another’s.

1.5 Summary
Event streaming captures events generated from different sources like
mobile devices, customer interaction with websites, online activity,
shipment tracking, and business transactions. Event streaming is
analogous to our nervous system.
An event is "something that happens," and the ability to react
immediately and review later is an essential concept of an event
streaming platform
Kafka acts as a central nervous system for your data and simplifies your
event stream processing architecture
The Kafka event streaming platform provides the core capabilities for
you to implement your event streaming application from end-to-end by
delivering the three main components of publish/consume, durable
storage, and processing.
Kafka brokers are the storage layer and service requests from clients for
writing and reading records. The brokers store records as bytes and do
not touch or alter the contents.
Schema Registry provides a way to ensure compatibility of records
between producers and consumers.
Producer clients write (produce) records to the broker. Consumer clients
consume records from the broker. The producer and consumer clients
are agnostic of each other. Additionally, the Kafka broker doesn’t know
who the individual clients are; they only process the requests.
Kafka Connect provides a mechanism for integrating existing systems,
such as external storage for getting data into and out of Kafka.
Kafka Streams is the native stream processing library for Kafka. It runs
at the perimeter of a Kafka cluster, not inside the brokers, and provides
support for transforming data, including joins and stateful
transformations.
ksqlDB is an event streaming database for Kafka. It allows you to build
robust real-time systems with just a few lines of SQL.

[1] https://www.merriam-webster.com/dictionary/event
2 Kafka Brokers
This chapter covers
Explaining how the Kafka Broker is the storage layer in the Kafka event
streaming platform
Describing how Kafka brokers handle requests from clients for writing
and reading records
Understanding topics and partitions
Using JMX metrics to check for a healthy broker

In Chapter One, I provided an overall view of the Kafka event streaming


platform and the different components that make up the platform. This
chapter will focus on the system’s heart, the Kafka broker. The Kafka broker
is the server in the Kafka architecture and serves as the storage layer.

In describing the broker behavior in this chapter, we’ll get into some lower-
level details. It’s essential to cover them to give you an understanding of how
the broker operates. Additionally, some of the things we’ll cover, such as
topics and partitions, are essential concepts you’ll need to understand when
we get into the client chapter. But as a developer, you won’t have to handle
these topics daily.

As the storage layer, the broker manages data, including retention and
replication. Retention is how long the brokers store records. Replication is
how brokers make copies of the data for durable storage, meaning you won’t
lose data if you lose a machine.

But the broker also handles requests from clients. Here’s an illustration
showing the client applications and the brokers:

Figure 2.1 Clients communicating with brokers


To give you a quick mental model of the broker’s role, we can summarize the
illustration above: Clients send requests to the broker. The broker then
processes those requests and sends a response. While I’m glossing over
several details of the interaction, that is the gist of the operation.

Note

Kafka is a deep subject, so I won’t cover every aspect. I’ll review enough
information to get you started working with the Kafka event streaming
platform. For in-depth coverage, look at Kafka in Action by Dylan Scott
(Manning, 2018).

You can deploy Kafka brokers on commodity hardware, containers, virtual


machines, or cloud environments. In this book, you’ll use Kafka in a docker
container, so you won’t need to install it directly. I’ll cover the necessary
Kafka installation in an appendix.

While you’re learning about the Kafka broker, I’ll need to talk about the
producer and consumer clients. But since this chapter is about the broker, I’ll
focus more on the broker’s responsibilities. So, I’ll leave out some of the
client details. But don’t worry; we’ll get to those details in a later chapter.

So, let’s get started with some walkthroughs of how a broker handles client
requests, starting with producing.

2.1 Produce record requests


When a client wants to send records to the broker, it does so with a produce
request. Clients send records to the broker for storage so that consuming
clients can later read those records.

Here’s an illustration of a producer sending records to a broker. It’s important


to note these illustrations aren’t drawn to scale. Typically, you’ll have many
clients communicating with several brokers in a cluster. A single client will
work with more than one broker. But it’s easier to get a mental picture of
what’s happening if I keep the illustrations simple. Also, note that I’m
simplifying the interaction, but we’ll cover more details when discussing
clients in Chapter 4.

Figure 2.2 Brokers handling produce records request


Let’s walk through the steps in the "Producing records" illustration.

1. The producer sends a batch of records to the broker. Whether a producer


or consumer, the client APIs always work with a collection of records to
encourage batching.
2. The broker takes the produce request out of the request queue.
3. The broker stores the records in a topic. Inside the topic are partitions. A
single batch of records belongs to a specific partition within a topic, and
the records are always appended at the end.
4. Once the broker stores the records, it responds to the producer. We’ll
talk more about what makes up a successful write later in this chapter
and again in chapter 4.

Now that we’ve walked through an example produce request, let’s walk
through another request type, fetch, which is the logical opposite of
producing records: consuming records.

2.2 Consume record requests


Now, let’s look at the other side of the coin, from a produce request to a
consume request. Consumer clients issue requests to a broker to read (or
consume) records from a topic. A critical point to understand is that
consuming records does not affect data retention or records availability to
other consuming clients. Kafka brokers can handle hundreds of consumer
requests for records from the same topic, and each request has no impact on
the others. We’ll get into data retention later, but the broker handles it
separately from consumers.

It’s also important to note that producers and consumers are unaware of each
other. The broker handles produce and consume requests separately; one has
nothing to do with the other. The example here is simplified to emphasize the
overall action from the broker’s point of view.

Figure 2.3 Brokers handling requests from a consumer


So, let’s go through the steps of the illustrated consumer request.

1. The consumer sends a fetch request specifying the offset from which it
wants to start reading records. We’ll discuss offsets in more detail later
in the chapter.
2. The broker takes the fetch request out of the request queue
3. Based on the offset and the topic partition in the request, the broker
fetches a batch of records
4. The broker sends the fetched batch of records in the response to the
consumer
Now that we’ve completed a walk through of two common request types,
produce and fetch, I’m sure you noticed a few terms I still need to describe:
topics, partitions, and offsets. Topics, partitions, and offsets are fundamental,
essential concepts in Kafka, so let’s take some time now to explore what they
mean.

2.3 Topics and partitions


In chapter one, we discussed that Kafka provides storage for data. Kafka
durably stores your data as an unbounded series of key-value pair messages
for as long as you want (messages contain other fields, such as a timestamp,
but we’ll get to those details later). Kafka replicates data across multiple
brokers, so losing a disk or an entire broker means no data is lost.

Specifically, Kafka brokers use the file system for storage by appending the
incoming records to the end of a file in a topic. A topic represents the
directory’s name containing the file to which the Kafka broker appends the
records.

Note

Kafka receives the key-value pair messages as raw bytes, stores them that
way, and serves the read requests in the same format. The Kafka broker is
unaware of the type of record that it handles. By merely working with raw
bytes, the brokers don’t spend time deserializing or serializing the data,
allowing for higher performance. We’ll see in Chapter 3 how you can ensure
that topics contain the expected byte format when we cover Schema Registry
in Chapter 3.

Topics have partitions, which is a way of further organizing the topic data
into slots or buckets. A partition is an integer starting at 0. So, if a topic has
three partitions, the partitions numbers are 0, 1, and 2. Kafka appends the
partition number to the end of the topic name, creating the same number of
directories as partitions with the form topic-N where the N represents the
partition number.

Kafka brokers have a configuration, log.dirs, where you place the top-level
directory’s name, which will contain all topic-partition directories. Let’s take
a look at an example. We will assume you’ve configured log.dirs with the
value /var/kafka/topic-data, and you have a topic named purchases with
three partitions.

Listing 2.1 Topic directory structure example

root@broker:/# tree /var/kafka/topic-data/purchases*

/var/kafka/topic-data/purchases-0
├── 00000000000000000000.index
├── 00000000000000000000.log
├── 00000000000000000000.timeindex
└── leader-epoch-checkpoint
/var/kafka/topic-data/purchases-1
├── 00000000000000000000.index
├── 00000000000000000000.log
├── 00000000000000000000.timeindex
└── leader-epoch-checkpoint
/var/kafka/topic-data/purchases-2
├── 00000000000000000000.index
├── 00000000000000000000.log
├── 00000000000000000000.timeindex
└── leader-epoch-checkpoint

As you can see here, the topic purchases with three partitions ends up as
three directories, purchases-0, purchases-1, and purchases-2 on the file
system. The topic name is more of a logical grouping, while the partition is
the storage unit.

Tip

The directory structure shown here was generated using the tree command, a
small command line tool used to display all contents of a directory.

While we’ll want to discuss those directories' contents, we still have some
details about topic partitions to cover.

Topic partitions are the unit of parallelism in Kafka. For the most part, the
higher the number of partitions, the higher your throughput. As the primary
storage mechanism, topic partitions allow for the spreading of messages
across several machines. The given topic’s capacity isn’t limited to the
available disk space on a single broker. Also, as mentioned before,
replicating data across several brokers ensures you won’t lose data should a
broker lose disks or die.

Later in this chapter, we’ll discuss load distribution more when discussing
replication, leaders, and followers. We’ll also cover a new feature, tiered
storage, where data is seamlessly moved to external storage, providing
virtually limitless capacity later in the chapter.

So, how does Kafka map records to partitions? The producer client
determines the topic and partition for the record before sending it to the
broker. Once the broker processes the record, it appends it to a file in the
corresponding topic-partition directory.

There are three possible ways of setting the partition for a record:

1. Kafka works with records in key-value pairs. Suppose the key is non-
null (keys are optional). In that case, the producer maps the record to a
partition using the deterministic formula of taking the hash of the key
modulo the number of partitions. This approach means that records with
identical keys always land on the same partition.
2. When building the ProducerRecord in your application, you can
explicitly set the partition for that record, which the producer then uses
before sending it.
3. If the message has no key or partition specified, then partitions are
alternated per batch. I’ll detail how Kafka handles records without keys
and partition assignments in chapter four.

Now that we’ve covered how topic partitions work let’s revisit that Kafka
always appends records to the end of the file. I’m sure you noticed the files in
the directory example with an extension of .log (we’ll talk about how Kafka
names this file in an upcoming section). But these log files aren’t the type
developers think of, where an application prints its status or execution steps.
The term log here is a transaction log, storing a sequence of events in the
order of occurrence. So, each topic partition directory contains its transaction
log. At this point, asking a question about log file growth would be fair.
We’ll discuss log file size and management when we cover segments later in
this chapter.

2.3.1 Offsets

As the broker appends each record, it assigns it an id called an offset. An


offset is a number (starting at 0) that the broker increments by 1 for each
record. In addition to being a unique ID, it represents the logical position in
the file. The term logical position means it’s the nth record in the file, but its
physical location is determined by the size in bytes of the preceding records.
In a later section, we’ll talk about how brokers use an offset to find the
physical position of a record. The following illustration demonstrates the
concept of offsets for incoming records:

Figure 2.4 Assigning the offset to incoming records


Since new records always go at the end of the file, they are in order by offset.
Kafka guarantees that records are in order within a partition but not across
partitions. Since records are in order by offset, we could also be tempted to
think they are in order by time, but that’s not necessarily the case. The
records are in order by their arrival time at the broker, but not necessarily by
event time. We’ll get more into time semantics in the chapter on clients when
we discuss timestamps. We’ll also cover event-time processing in depth when
we get to the chapters on Kafka Streams.

Consumers use offsets to track the position of records they’ve already


consumed. That way, the broker fetches records starting with an offset one
higher than the last one read by a consumer. Let’s look at an illustration to
explain how offsets work:

Figure 2.5 Offsets indicate where a consumer has left off reading records

In the illustration, if a consumer reads records with offsets 0-5, the broker
only fetches records starting at offset 6 in the following consumer request.
The offsets used are unique for each consumer and are stored in an internal
topic named {underscore}consumer{underscore}offsets. We’ll go into
more detail about consumers and offsets in chapter four.

Now that we’ve covered topics, partitions, and offsets, let’s quickly discuss
some trade-offs regarding the number of partitions to use.

2.3.2 Determining the correct number of partitions


Choosing the number of partitions to use when creating a topic is part art and
part science. One of the critical considerations is the amount of data flowing
into a given topic. More data implies more partitions for higher throughput.
But as with anything in life, there are trade-offs.

Increasing the number of partitions increases the number of TCP connections


and open file handles. How long it takes to process an incoming record in a
consumer will also determine throughput. If you have heavyweight
processing in your consumer, adding more partitions may help, but the slower
processing will ultimately hinder performance.[1]

Here are some things to consider when setting the number of partitions. You
want to choose a high enough number to cover high-throughput situations,
but not so high that you hit limits for the number of partitions a broker can
handle as you create more and more topics. A good starting point could be
the number 30, which is evenly divisible by several numbers, which results in
a more even distribution of keys in the processing layer. [2] We’ll talk more
about the importance of key distribution in later chapters on clients and
Kafka Streams.

At this point, you’ve learned that the broker handles client requests and is the
storage layer for the Kafka event streaming platform. You’ve also learned
about topics, partitions, and their role in the storage layer.

Your next step is to get your hands dirty, producing and consuming records
to see these concepts in action.

Note

We’ll cover the producer and consumer clients in Chapter 4. Console clients
are helpful for learning, quick prototypes, and debugging. But in practice,
you’ll use the clients in your code.

2.4 Sending your first messages


You’ll need to run a Kafka broker to run the following examples. In the
previous edition of this book, the instructions were to download a binary
version of the Kafka tar file and extract it locally. In this edition, I’ve opted to
run Kafka via docker instead. Specifically, we’ll use docker compose,
making running a multi-container docker application easy. If you are running
Mac OS or Windows, you can install docker desktop, which includes docker
compose. For more information on installing docker, see the installation
instructions on the docker site: https://docs.docker.com/get-docker/.

Let’s start working with a Kafka broker by producing and consuming some
records.

2.4.1 Creating a topic


Your first step for producing or consuming records is to create a topic. But
you’ll need a running Kafka broker to do that, so let’s take care of that now. I
assume you’ve already installed Docker at this point. To start Kafka,
download the docker-compose.yml file from the source code repo here
TOOD-create GitHub repo. After downloading the file, open a new terminal
window and CD to the directory with the docker-compose.yml file, and run
this command `docker-compose up -d'.

Tip

Starting docker-compose with the `-d' flag runs the docker services in the
background. While it’s OK to start docker-compose without the `-d' flag, the
containers print their output to the terminal, so you need to open a new
terminal window to do any further operations.

Wait a few seconds, then run this command to open a shell on the docker
broker container: docker-compose exec broker bash.
Using the docker broker container shell you just opened up, run this
command to create a topic:
kafka-topics --create --topic first-topic\
--bootstrap-server localhost:9092\ #1
--replication-factor 1\ #2
--partitions 1 #3

Important

Although you’re using kafka in a docker container, the commands to create


topics and run the console producer and consumer are the same.

Since you’re running a local broker for testing, you don’t need a replication
factor greater than 1. The same thing goes for the number of partitions; at this
point, you only need one partition for this local development.

Now you have a topic, let’s write some records to it.

2.4.2 Producing records on the command line


Now, from the same window, you ran the create topic command to start a
console producer:
kafka-console-producer --topic first-topic\ #1
--bootstrap-server localhost:9092\ #2
--property parse.key=true\ #3
--property key.separator=":" #4

When using the console producer, you need to specify if you will provide
keys. Although Kafka works with key-value pairs, the key is optional and can
be null. Since the key and value go on the same line, you must specify how
Kafka can parse the key and value by providing a delimiter.

After you enter the above command and hit enter, you should see a prompt
waiting for your input. Enter some text like the following:
key:my first message
key:is something
key:very simple
Other documents randomly have
different content
The Project Gutenberg eBook of Selling Latin
America: A Problem in International
Salesmanship.
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

Title: Selling Latin America: A Problem in International


Salesmanship.

Author: William Edmund Aughinbaugh

Release date: August 22, 2019 [eBook #60150]


Most recently updated: October 17, 2024

Language: English

Credits: Produced by Richard Tonsing and the Online Distributed


Proofreading Team at http://www.pgdp.net (This file
was
produced from images generously made available by
The
Internet Archive)

*** START OF THE PROJECT GUTENBERG EBOOK SELLING LATIN


AMERICA: A PROBLEM IN INTERNATIONAL SALESMANSHIP. ***
SELLING LATIN AMERICA
SELLING LATIN AMERICA
A Problem in International Salesmanship

WHAT TO SELL AND HOW TO SELL IT

BY

WILLIAM E. AUGHINBAUGH, M.D., LL.B., LL.M.


Illustrated from Photographs

BOSTON
SMALL, MAYNARD & COMPANY
PUBLISHERS
Copyright, 1915

By Small, Maynard and Company

(INCORPORATED)

Printers
S. J. Parkhill & Co., Boston, U.S.A.
FOREWORD

I made the acquaintance of Doctor W. E. Aughinbaugh about eight


years ago, when I was in charge of the advertising department of a
large concern doing an international business. The doctor came with
us to look after the export trade, especially in the West Indies and
South America. My work naturally brought me into close association
with him, and I soon began to appreciate his unusual ability in many
directions and his special fitness for the position he occupied. There
seemed to be no phase of merchandising in far-off markets with
which he was not fully conversant; nor did this knowledge relate
solely to Latin America. He had previously travelled the distant
markets of the Orient in the interests of an American house whose
products he successfully introduced there and to him the Far East
was an open book.
He has been in Egypt eight times on business missions. He has
travelled Somaliland, Palestine, Asia Minor, Morocco, Tunis, Tripoli,
Algiers, South Africa, Persia, Arabia, Afghanistan, Cashmir,
Beluchistan, India, Assam, Burma, Siam, China, Cochin-China,
Japan, the East Indies and all over Europe with the single exception
of Russia. The doctor also spent two years of his restless life in the
Far North where a business mission of importance took him into
Iceland, Greenland, Labrador, Newfoundland, Cape Breton Island,
Prince Edward Island and the Hudson’s Bay Country. As to the West
Indies and South America, he has been not only to them, but through
them many times and in every habitable spot where business was to
be done. Some idea may be gained as to the frequency of his visits to
South America by mentioning the fact that he has made thirty-six
trips across the Equator.
Dr. Aughinbaugh talks about the markets of foreign countries with
the authority of long experience for he has been engaged in these
special fields for more than twenty years; yet he is still a young man
with a modern viewpoint. He speaks the languages of many countries
and speaks them well. His information is first-hand, reliable data
gathered on the ground where he lived and worked, whose people he
knew and could speak to in their own tongue, not the unreliable,
superficial vaporings of some dilettante globe-trotter who has given
the high-spots of civilization the “once over” and therefore considers
himself a competent authority to write upon the commerce, customs
and manners of foreign countries the very languages of which he
does not understand without the aid of an interpreter, or who could
not find his way back to the railway station or dock without the
assistance of a guide.
Doctor Aughinbaugh is no such lightweight. He has not written
this book because he believes he knows it all. Left to himself he
would never have written it. It was only after repeated urgings on the
part of some of his friends who appreciated his ability to write an
unusual book, that he consented to undertake the work, and then he
did so under protest.
It may be asked with pertinence how a man could travel in the
interest of one line and yet be in possession of so much information
relating to every other line; or how one could master the intricacies
of foreign banking and credits and still attend to his business. The
answer to all of this is that no man can successfully negotiate foreign
markets unless he is more than a mere “order taker.” As to the
doctor’s ability to measure the requirements of a market all the way
from cereals to concrete, that may be accounted for by the fact that
he is both a physician and a graduate of the law, and while he never
practised at the bar to any great extent he did have considerable
experience in medicine, a profession which developed a naturally
analytical mind, so that he looked at things with the eyes of a student
and from the viewpoint of the trained diagnostician. For six years he
followed medicine in Latin America, finally giving it up to accept an
offer from a large company who compensated him accordingly. His
experience in that line alone took him all over the world and the
ramifications of the business brought him into close contact with the
marketing of nearly every other commodity. But even had this not
been so, he is the sort of man who would have sensed a business
opportunity because he is naturally a keen observer and everything
interests him. He is the type of man who absorbs information; he
does not have to be shown—he sees.
Here, then, is a man possessed of a fund of particularly desirable
information—especially valuable to-day when Europe is war-mad
and, in her sanguinary frenzy, has left open the door of opportunity
to peaceful Uncle Sam. Why not put this information in concrete
form for the benefit of American commerce?
These considerations were put up to the author by some of his
friends who knew him to be a keen, accurate, analytical observer, a
writer and a raconteur of more than ordinary ability, and this book
was the result.
Probably never—let us fervently hope never for the same reason—
will the United States have another opportunity such as the present
one, to enter those fruitful fields to the south, where Europe in
general, and Germany in particular, has reaped a golden harvest for
so many years.
A careful reading of this book—not a difficult matter, for unlike
most works on commerce it is full of lively interest—will be profitable
to every business man interested in the subject of Latin America. It
will be valuable to those who are equipped or willing to prepare
themselves to cope with conditions as they really are, and just as
valuable to those who are not, for it may save them from the costly
mistakes of experimentation in foreign fields.
Maurice Switzer.

New York, March 20, 1915.


CONTENTS

CHAPTER PAGE

I General Remarks on Foreign Trade 1

II Brazil 13

III Argentine 31

IV Uruguay 49

V Paraguay 57

VI Chile 67

VII Bolivia 79

VIII Peru 91

IX Ecuador 106

X Colombia 114

XI Venezuela 126

XII Central America 138

XIII Mexico 156

XIV Cuba 168


XV Santo Domingo 176

XVI Haiti 182

XVII Porto Rico 186

XVIII The Guianas: British, Dutch and French 191

XIX European Possessions in the West Indies 199

XX Foreign Trade with Latin America and How It


Developed 212

XXI Methods of Doing Business 224

XXII The Salesman and the Customer 242

XXIII Custom-Houses and Tariffs 266

XXIV Trade Marks 276

XXV Finance and Credits 288

XXVI Packing and Shipping 311

XXVII Advertising 331

XXVIII Reciprocity 345

XXIX Health Precautions 368

Appendix 377

Index 401
ILLUSTRATIONS

PAGE

The harbor of Rio de Janeiro 14

Avenida Rio Branco and Opera House, Rio de Janeiro 28

Taking produce to the station, Argentine 36

Grain elevators, Buenos Aires 44

Interior of a gentlemen’s hat store, Asuncion, Paraguay 60

A country store in Colombia 60

Valparaiso 68

Lake Titicaca at Puno, Peru 86

Oroya Line, Peru 98

A comparison of climates 224

Drying hides and skins in Argentine 240

Avenida Central, Rio de Janeiro 262

Calle Rivàdavia, Buenos Aires 288

A Pack-train on the Andes Trail in Colombia 312


Llamas in Cerro de Pasco, Peru 316

Chilean infantry. See page 220 340

Advertisement of Cognac Bisquit 340

South American appreciation of advertisements “made in U. S.


A.” 342

The Plaza Hotel in Buenos Aires 368

MAPS

South America Frontispiece

Central America 138

Mexico 156

The West Indies 168


SELLING LATIN AMERICA
I
GENERAL REMARKS ON FOREIGN TRADE

War completely changes commercial currents. The victor takes the


established and profitable trade, leaving to the vanquished the
harder lines of business and the development of new fields. This is as
true of the first war recorded by history as it will be of the last.
As an illustration of the veracity of this statement it is only
necessary to recall our war with Spain. Prior to her defeat, Spain
controlled the bulk of the banking and commerce of the Philippines,
Cuba and Porto Rico. To her possessions she exported wines, foods,
manufactured articles, textiles, drugs, perfumes, canned goods,
shoes and hats, receiving in exchange their sugar, tobacco and coffee.
To-day the United States consumes all of these exports, while the
requirements of the three countries are supplied by America, which
also does their financing through banks organized in these
possessions, and capitalized with American money. To be more
specific and by way of a concrete example let me mention Cuba,
which in 1913 exported $165,000,000 worth of products, all but 15
per cent. of which was taken by the United States, the amount
shipped to Spain being about four-tenths of one per cent. During the
same period of time she imported goods to the value of
$132,000,000 of which we supplied 65 per cent. against Spain’s 8
per cent. Since 1902, Cuba’s foreign commerce has increased 250 per
cent. due absolutely to the part played by the United States in the
Spanish-American war. The same condition of affairs in exports,
imports and other lines is equally true, although not on such a large
scale, of course, of the Philippines and Porto Rico.
The Napoleonic wars gave to England the strong position she now
occupies in the financial and commercial world. Her bankers and
shippers, merchants and manufacturers, with one accord grasped the
opportunity that presented itself then and have held the supremacy
thus gained for more than a century.
Perhaps it was the recollection of what gave Great Britain her start
in this field which led the London Spectator to remark, at the
outbreak of war in 1914:
“The present war gives the United Kingdom an excellent
opportunity to capture the export and import trade of Germany and
Austria-Hungary.”
If England, engaged in the most desperate and expensive war she
or the civilized world ever has known, with her enormous resources
taxed to their utmost, saw an “opportunity” for trade expansion, how
much greater is the chance in this line for an absolutely neutral
power, populated with keen business men, and provided by Nature
with unparalleled productive possibilities.
The war in Europe developed the most remarkable business
situation for the United States ever presented to any nation. The
virtual closing of all the doors of the export and import trade of the
Old World and the almost total dependence heretofore of the Far
East and Latin America, especially, on Europe for finance and trade
connections made the war truly the psychological moment for us, as
a nation, not only to overcome the lead of the European commercial
world, but also to cement by other than ties of business the bonds of
friendship due us not only on account of our ideal geographical
position, but also because of our similar republican form of
government.
By embracing this extraordinary opportunity—apparently almost
created for our express benefit, we being the only people able to
profit by it—we can make the nations which formerly depended on
Europe for support in their trade ventures our business allies, our
sincere friends and well-wishers, and at the same time bring about a
new trade alignment so that all America will reap the benefit.
Let us briefly consider some of the enormous possibilities of
foreign trade in Latin American countries.
Latin America—that is, the countries of Central and South
America, together with Mexico, Cuba, Santo Domingo and Porto
Rico—comprises twenty distinct states, with a total population of
about 65,000,000, a large portion of whom are Indians and half-
breeds—a fact which we should not lose sight of in view of the
tremendous imports.
Statistics recently compiled by the Pan-American Bureau show
that these countries, in 1913, conducted a foreign commerce valued
at $2,870,178,575. Of this the imports were $1,304,261,763, and the
exports, $1,565,916,812, thus giving Latin America a favorable
balance of $261,655,049.
Ten of these countries alone purchased goods to the amount of
$961,000,000. Of this sum Great Britain supplied $273,000,000;
Germany, $180,000,000; France, $84,000,000; Italy, $54,000,000;
Belgium, $47,000,000, and Austria-Hungary, $8,000,000. The
United States exported to these ten countries last year $160,000,000
and imported from them $250,000,000. Brazil, in 1913, imported
$15,000,000 in textiles alone, of which amount the United States
supplied only $500,000. In the same length of time Argentine
imported goods to the amount of $468,999,996, of which amount
less than 8 per cent, was supplied by this country. The United
Kingdom exported to all of Latin America $23,500,000 worth of coal
in 1913, the United States, during the same period of time, $750,000.
Practically the same story in all lines of exports could be told of
these countries, demonstrating that individually in nearly all cases
the United States is the largest consumer of their raw or finished
products and the smallest exporter of the goods they most require.
Fearful that some one may infer after looking at these figures that
European countries have preferential duties with Latin America, let
me state most emphatically that this is not the case. With one single
exception no favoritism is shown any of the trading nations, in the
matter of import fees, and in that instance we benefit by it. Brazil
makes a decided preferential tariff in favor of some of our goods in
view of the fact that we are the largest consumers of her chief
product—coffee.
Everyone of these countries is in process of development and
expansion. They have in profusion the things the busy world most
needs. Their mines are the richest known to man. Some have been
worked for thousands of years and are still productive. Their broad
fields are destined to make them the granaries of the world. Their
miles of pasture lands and their extensive acreage mean that Europe
and the United States will depend upon them for meat. Their vast
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like