Learning Apache Kafka 2nd Edition Start from scratch and learn how to administer Apache Kafka effectively for messaging Nishant Garg download
Learning Apache Kafka 2nd Edition Start from scratch and learn how to administer Apache Kafka effectively for messaging Nishant Garg download
https://ebookgate.com/product/learning-apache-kafka-2nd-edition-
start-from-scratch-and-learn-how-to-administer-apache-kafka-
effectively-for-messaging-nishant-garg/
https://ebookgate.com/product/kafka-translated-how-translators-
have-shaped-our-reading-of-kafka-1-publ-edition-kafka/
https://ebookgate.com/product/the-apache-2nd-edition-joseph-c-
jastrzembski/
https://ebookgate.com/product/metamorphosis-webster-s-thesaurus-
edition-franz-kafka/
https://ebookgate.com/product/apache-karaf-cookbook-achim-
nierbeck/
Apache Ofbiz Cookbook Ruth Hoffman
https://ebookgate.com/product/apache-ofbiz-cookbook-ruth-hoffman/
https://ebookgate.com/product/apache-zookeeper-essentials-a-fast-
paced-guide-to-using-apache-zookeeper-to-coordinate-services-in-
distributed-systems-1st-edition-saurav-haloi/
https://ebookgate.com/product/apache-jakarta-tomcat-1st-edition-
james-goodwill/
https://ebookgate.com/product/apache-tomcat-7-1st-edition-aleksa-
vukotic/
www.it-ebooks.info
www.it-ebooks.info
Learning Apache Kafka Second Edition
www.it-ebooks.info
Table of Contents
Learning Apache Kafka Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Introducing Kafka
Welcome to the world of Apache Kafka
Why do we need Kafka?
Kafka use cases
Installing Kafka
Installing prerequisites
Installing Java 1.7 or higher
Downloading Kafka
Building Kafka
Summary
2. Setting Up a Kafka Cluster
www.it-ebooks.info
A single node – a single broker cluster
Starting the ZooKeeper server
Starting the Kafka broker
Creating a Kafka topic
Starting a producer to send messages
Starting a consumer to consume messages
A single node – multiple broker clusters
Starting ZooKeeper
Starting the Kafka broker
Creating a Kafka topic using the command line
Starting a producer to send messages
Starting a consumer to consume messages
Multiple nodes – multiple broker clusters
The Kafka broker property list
Summary
3. Kafka Design
Kafka design fundamentals
Log compaction
Message compression in Kafka
Replication in Kafka
Summary
4. Writing Producers
The Java producer API
Simple Java producers
Importing classes
Defining properties
Building the message and sending it
Creating a Java producer with custom partitioning
Importing classes
Defining properties
Implementing the Partitioner class
www.it-ebooks.info
Building the message and sending it
The Kafka producer property list
Summary
5. Writing Consumers
Kafka consumer APIs
The high-level consumer API
The low-level consumer API
Simple Java consumers
Importing classes
Defining properties
Reading messages from a topic and printing them
Multithreaded Java consumers
Importing classes
Defining properties
Reading the message from threads and printing it
The Kafka consumer property list
Summary
6. Kafka Integrations
Kafka integration with Storm
Introducing Storm
Integrating Storm
Kafka integration with Hadoop
Introducing Hadoop
Integrating Hadoop
Hadoop producers
Hadoop consumers
Summary
7. Operationalizing Kafka
Kafka administration tools
Kafka cluster tools
Adding servers
www.it-ebooks.info
Kafka topic tools
Kafka cluster mirroring
Integration with other tools
Summary
Index
www.it-ebooks.info
www.it-ebooks.info
Learning Apache Kafka Second Edition
www.it-ebooks.info
www.it-ebooks.info
Learning Apache Kafka Second Edition
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, without the prior written permission of the
publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its
dealers and distributors will be held liable for any damages caused or alleged to be caused
directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2013
Second edition: February 2015
Production reference: 1210215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-309-0
www.packtpub.com
www.it-ebooks.info
www.it-ebooks.info
Credits
Author
Nishant Garg
Reviewers
Sandeep Khurana
Saurabh Minni
Supreet Sethi
Commissioning Editor
Usha Iyer
Acquisition Editor
Meeta Rajani
Content Development Editor
Shubhangi Dhamgaye
Technical Editors
Manal Pednekar
Chinmay S. Puranik
Copy Editors
Merilyn Pereira
Aarti Saldanha
Project Coordinator
Harshal Ved
Proofreaders
Stephen Copestake
Paul Hindle
Indexer
Rekha Nair
Graphics
Sheetal Aute
Production Coordinator
Nilesh R. Mohite
Cover Work
www.it-ebooks.info
Nilesh R. Mohite
www.it-ebooks.info
www.it-ebooks.info
About the Author
Nishant Garg has over 14 years of software architecture and development experience in
various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume,
Sqoop, Oozie, Spark, Shark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL
databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as
GreenPlum).
He received his MS in software systems from the Birla Institute of Technology and
Science, Pilani, India, and is currently working as a technical architect for the Big Data
R&D Group with Impetus Infotech Pvt. Ltd. Previously, Nishant has enjoyed working
with some of the most recognizable names in IT services and financial industries,
employing full software life cycle methodologies such as Agile and SCRUM.
Nishant has also undertaken many speaking engagements on big data technologies and is
also the author of HBase Essestials, Packt Publishing.
I would like to thank my parents (Mr. Vishnu Murti Garg and Mrs. Vimla Garg) for their
continuous encouragement and motivation throughout my life. I would also like to thank
my wife (Himani) and my kids (Nitigya and Darsh) for their never-ending support, which
keeps me going.
Finally, I would like to thank Vineet Tyagi, CTO and Head of Innovation Labs, Impetus,
and Dr. Vijay, Director of Technology, Innovation Labs, Impetus, for encouraging me to
write.
www.it-ebooks.info
www.it-ebooks.info
About the Reviewers
Sandeep Khurana, an 18 years veteran, comes with an extensive experience in the
Software and IT industry. Being an early entrant in the domain, he has worked in all
aspects of Java- / JEE-based technologies and frameworks such as Spring, Hibernate, JPA,
EJB, security, Struts, and so on. For the last few professional engagements in his career
and also partly due to his personal interest in consumer-facing analytics, he has been
treading in the big data realm and has extensive experience on big data technologies such
as Hadoop, Pig, Hive, ZooKeeper, Flume, Oozie, HBase and so on.
He has designed, developed, and delivered multiple enterprise-level, highly scalable,
distributed systems during the course of his career. In his long and fruitful professional
life, he has been with some of the biggest names of the industry such as IBM, Oracle,
Yahoo!, and Nokia.
Saurabh Minni is currently working as a technical architect at AdNear. He completed his
BE in computer science at the Global Academy of Technology, Bangalore. He is
passionate about programming and loves getting his hands wet with different technologies.
At AdNear, he deployed Kafka. This enabled smooth consumption of data to be processed
by Storm and Hadoop clusters. Prior to AdNear, he worked with Adobe and Intuit, where
he dabbled with C++, Delphi, Android, and Java while working on desktop and mobile
products.
Supreet Sethi is a seasoned technology leader with an eye for detail. He has proven
expertise in charting out growth strategies for technology platforms. He currently steers
the platform team to create tools that drive the infrastructure at Jabong. He often reviews
the code base from a performance point of view. These aspects also put him at the helm of
backend systems, APIs that drive mobile apps, mobile web apps, and desktop sites.
The Jabong tech team has been extremely helpful during the review process. They
provided a creative environment where Supreet was able to explore some of cutting-edge
technologies like Apache Kafka.
I would like to thank my daughter, Seher, and my wife, Smriti, for being patient observers
while I spent a few hours everyday reviewing this book.
www.it-ebooks.info
www.it-ebooks.info
www.PacktPub.com
www.it-ebooks.info
Support files, eBooks, discount offers, and
more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and
ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as
a print book customer, you are entitled to a discount on the eBook copy. Get in touch with
us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up
for a range of free newsletters and receive exclusive discounts and offers on Packt books
and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt’s online digital
book library. Here, you can search, access, and read Packt’s entire library of books.
www.it-ebooks.info
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
www.it-ebooks.info
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access
PacktLib today and view 9 entirely free books. Simply use your login credentials for
immediate access.
www.it-ebooks.info
www.it-ebooks.info
Preface
This book is here to help you get familiar with Apache Kafka and to solve your challenges
related to the consumption of millions of messages in publisher-subscriber architectures. It
is aimed at getting you started programming with Kafka so that you will have a solid
foundation to dive deep into different types of implementations and integrations for Kafka
producers and consumers.
In addition to an explanation of Apache Kafka, we also spend a chapter exploring Kafka
integration with other technologies such as Apache Hadoop and Apache Storm. Our goal
is to give you an understanding not just of what Apache Kafka is, but also how to use it as
a part of your broader technical infrastructure. In the end, we will walk you through
operationalizing Kafka where we will also talk about administration.
www.it-ebooks.info
What this book covers
Chapter 1, Introducing Kafka, discusses how organizations are realizing the real value of
data and evolving the mechanism of collecting and processing it. It also describes how to
install and build Kafka 0.8.x using different versions of Scala.
Chapter 2, Setting Up a Kafka Cluster, describes the steps required to set up a single- or
multi-broker Kafka cluster and shares the Kafka broker properties list.
Chapter 3, Kafka Design, discusses the design concepts used to build the solid foundation
for Kafka. It also talks about how Kafka handles message compression and replication in
detail.
Chapter 4, Writing Producers, provides detailed information about how to write basic
producers and some advanced level Java producers that use message partitioning.
Chapter 5, Writing Consumers, provides detailed information about how to write basic
consumers and some advanced level Java consumers that consume messages from the
partitions.
Chapter 6, Kafka Integrations, provides a short introduction to both Storm and Hadoop
and discusses how Kafka integration works for both Storm and Hadoop to address real-
time and batch processing needs.
Chapter 7, Operationalizing Kafka, describes information about the Kafka tools required
for cluster administration and cluster mirroring and also shares information about how to
integrate Kafka with Camus, Apache Camel, Amazon Cloud, and so on.
www.it-ebooks.info
www.it-ebooks.info
What you need for this book
In the simplest case, a single Linux-based (CentOS 6.x) machine with JDK 1.6 installed
will give a platform to explore almost all the exercises in this book. We assume you are
familiar with command line Linux, so any modern distribution will suffice.
Some of the examples need multiple machines to see things working, so you will require
access to at least three such hosts; virtual machines are fine for learning and exploration.
As we also discuss the big data technologies such as Hadoop and Storm, you will
generally need a place to run your Hadoop and Storm clusters.
www.it-ebooks.info
www.it-ebooks.info
Who this book is for
This book is for those who want to know about Apache Kafka at a hands-on level; the key
audience is those with software development experience but no prior exposure to Apache
Kafka or similar technologies.
This book is also for enterprise application developers and big data enthusiasts who have
worked with other publisher-subscriber-based systems and now want to explore Apache
Kafka as a futuristic scalable solution.
www.it-ebooks.info
www.it-ebooks.info
Conventions
In this book, you will find a number of styles of text that distinguish between different
kinds of information. Here are some examples of these styles, and an explanation of their
meaning.
Code words in text are shown as follows: “Download the jdk-7u67-linux-x64.rpm
release from Oracle’s website.”
A block of code is set as follows:
String messageStr = new String("Hello from Java Producer");
KeyedMessage<Integer, String> data = new KeyedMessage<Integer, String>
(topic, messageStr);
producer.send(data);
When we wish to draw your attention to a particular part of a code block, the relevant
lines or items are set in bold:
Properties props = new Properties();
props.put("metadata.broker.list","localhost:9092");
props.put("serializer.class","kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<Integer, String> producer = new Producer<Integer,
String>(config);
www.it-ebooks.info
www.it-ebooks.info
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this
book—what you liked or may have disliked. Reader feedback is important for us to
develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and
mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or
contributing to a book, see our author guide on www.packtpub.com/authors.
www.it-ebooks.info
www.it-ebooks.info
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help
you to get the most from your purchase.
www.it-ebooks.info
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do
happen. If you find a mistake in one of our books—maybe a mistake in the text or the
code—we would be grateful if you would report this to us. By doing so, you can save
other readers from frustration and help us improve subsequent versions of this book. If
you find any errata, please report them by visiting http://www.packtpub.com/submit-
errata, selecting your book, clicking on the errata submission form link, and entering the
details of your errata. Once your errata are verified, your submission will be accepted and
the errata will be uploaded on our website, or added to any list of existing errata, under the
Errata section of that title. Any existing errata can be viewed by selecting your title from
http://www.packtpub.com/support.
www.it-ebooks.info
Piracy
Piracy of copyright material on the Internet is an ongoing problem across all media. At
Packt, we take the protection of our copyright and licenses very seriously. If you come
across any illegal copies of our works, in any form, on the Internet, please provide us with
the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated
material.
We appreciate your help in protecting our authors, and our ability to bring you valuable
content.
www.it-ebooks.info
Questions
You can contact us at <[email protected]> if you are having a problem with any
aspect of the book, and we will do our best to address it.
www.it-ebooks.info
www.it-ebooks.info
Chapter 1. Introducing Kafka
In today’s world, real-time information is continuously being generated by applications
(business, social, or any other type), and this information needs easy ways to be reliably
and quickly routed to multiple types of receivers. Most of the time, applications that
produce information and applications that are consuming this information are well apart
and inaccessible to each other. These heterogeneous application leads to redevelopment
for providing an integration point between them. Therefore, a mechanism is required for
the seamless integration of information from producers and consumers to avoid any kind
of application rewriting at either end.
www.it-ebooks.info
Welcome to the world of Apache Kafka
In the present big-data era, the very first challenge is to collect the data as it is a huge
amount of data and the second challenge is to analyze it. This analysis typically includes
the following types of data and much more:
User behavior data
Application performance tracing
Activity data in the form of logs
Event messages
Message publishing is a mechanism for connecting various applications with the help of
messages that are routed between—for example, by a message broker such as Kafka.
Kafka is a solution to the real-time problems of any software solution; that is to say,
dealing with real-time volumes of information and routing it to multiple consumers
quickly. Kafka provides seamless integration between information from producers and
consumers without blocking the producers of the information and without letting
producers know who the final consumers are.
Apache Kafka is an open source, distributed, partitioned, and replicated commit-log-based
publish-subscribe messaging system, mainly designed with the following characteristics:
Persistent messaging: To derive the real value from big data, any kind of
information loss cannot be afforded. Apache Kafka is designed with O(1) disk
structures that provide constant-time performance even with very large volumes of
stored messages that are in the order of TBs. With Kafka, messages are persisted on
disk as well as replicated within the cluster to prevent data loss.
High throughput: Keeping big data in mind, Kafka is designed to work on
commodity hardware and to handle hundreds of MBs of reads and writes per second
from large number of clients.
Distributed: Apache Kafka with its cluster-centric design explicitly supports
message partitioning over Kafka servers and distributing consumption over a cluster
of consumer machines while maintaining per-partition ordering semantics. Kafka
cluster can grow elastically and transparently without any downtime.
Multiple client support: The Apache Kafka system supports easy integration of
clients from different platforms such as Java, .NET, PHP, Ruby, and Python.
Real time: Messages produced by the producer threads should be immediately
visible to consumer threads; this feature is critical to event-based systems such as
Complex Event Processing (CEP) systems.
Kafka provides a real-time publish-subscribe solution that overcomes the challenges of
consuming the real-time and batch data volumes that may grow in order of magnitude to
be larger than the real data. Kafka also supports parallel data loading in the Hadoop
systems.
The following diagram shows a typical big data aggregation-and-analysis scenario
supported by the Apache Kafka messaging system:
www.it-ebooks.info
On the production side, there are different kinds of producers, such as the following:
Frontend web applications generating application logs
Producer proxies generating web analytics logs
Producer adapters generating transformation logs
Producer services generating invocation trace logs
On the consumption side, there are different kinds of consumers, such as the following:
Offline consumers that are consuming messages and storing them in Hadoop or
traditional data warehouse for offline analysis
Near real-time consumers that are consuming messages and storing them in any
NoSQL datastore, such as HBase or Cassandra, for near real-time analytics
Real-time consumers, such as Spark or Storm, that filter messages in-memory and
trigger alert events for related groups
www.it-ebooks.info
www.it-ebooks.info
Why do we need Kafka?
A large amount of data is generated by companies having any form of web- or device-
based presence and activity. Data is one of the newer ingredients in these Internet-based
systems and typically includes user activity; events corresponding to logins; page visits;
clicks; social networking activities such as likes, shares, and comments; and operational
and system metrics. This data is typically handled by logging and traditional log
aggregation solutions due to high throughput (millions of messages per second). These
traditional solutions are the viable solutions for providing logging data to an offline
analysis system such as Hadoop. However, the solutions are very limiting for building
real-time processing systems.
According to the new trends in Internet applications, activity data has become a part of
production data and is used to run analytics in real time. These analytics can be:
Search-based on relevance
Recommendations based on popularity, co-occurrence, or sentimental analysis
Delivering advertisements to the masses
Internet application security from spam or unauthorized data scraping
Device sensors sending high-temperature alerts
Any abnormal user behavior or application hacking
Real-time usage of these multiple sets of data collected from production systems has
become a challenge because of the volume of data collected and processed.
Apache Kafka aims to unify offline and online processing by providing a mechanism for
parallel load in Hadoop systems as well as the ability to partition real-time consumption
over a cluster of machines. Kafka can be compared with Scribe or Flume as it is useful for
processing activity stream data; but from the architecture perspective, it is closer to
traditional messaging systems such as ActiveMQ or RabitMQ.
www.it-ebooks.info
Another Random Scribd Document
with Unrelated Content
The Project Gutenberg eBook of The American
Missionary — Volume 37, No. 2, February,
1883
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Author: Various
Language: English
EDITORIAL.
THE SOUTH.
THE CHINESE.
CHILDREN’S PAGE.
Missionary Music 56
RECEIPTS 57
NEW YORK.
PUBLISHED BY THE AMERICAN MISSIONARY ASSOCIATION,
Rooms, 56 Reade Street.
PRESIDENT.
WANTED
$375,000,
Efficiently to prosecute the work in hand.
CO-OPERATION
Of every Congregational minister, and of every office bearer in our
Congregational churches to secure (a) an annual presentation of the
work, and claims of the A. M. A. in every Congregational church; and
(b) an annual contribution from every Congregational church in the
country for this great work.
HELP
Of every Congregational Sunday-school superintendent to secure
from his school a contribution to our “Student Aid Fund.”
AID
Of every Ladies’ Missionary Society to sustain our work among the
colored women and girls.
ENDOWMENTS
For Professorships and Scholarships in our schools. The time has
come when in our larger institutions the chairs of instruction should
be endowed, that the Association may be left to enlarge its
missionary work in other directions.
GIFTS
For the improvement of schools and churches already built, and the
erection of additional buildings, imperatively needed.
A SUBSCRIBER
In every family for our monthly magazine,
American Missionary.
Vol. XXXVII. FEBRUARY, 1883.
No. 2.
American Missionary Association.
DISTRICT OF COLUMBIA.
WASHINGTON.
Pastor.
Rev. S. P. Smith, Chicago, Ill.
Special Missionary.
Mrs. C. B. Babcock, Newburyport, Mass.
VIRGINIA.
HAMPTON.
Minister.
Rev. H. B. Frissell, New York City.
NORMAL AND AGRICULTURAL INSTITUTE.
NORTH CAROLINA.
Minister.
Rev. D. D. Dodge, Nashua, N.H.
NORMAL SCHOOL.
Principal.
Rev. W. H. Thrall, Derby, Ct.
Assistants.
Miss H. L. Fitts, Candia, N.H.
Miss E. A. Warner, Lowell, Mass.
Miss Ella F. Jewett, Pepperell, Mass.
Miss Ernestine Patterson, Providence, R.I.
Miss Mary D. Hyde, Zumbrota, Minn.
Miss Kate A. Shepard, New York City.
Mrs. Janet Dodge, Nashua, N.H.
Special Missionary.
Miss A. E. Farrington, Portland, Me.
RALEIGH.
Minister.
Rev. Geo. S. Smith, Raleigh, N.C.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookgate.com