Zookeeper Tutorial
Zookeeper Tutorial
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an
easy and robust manner. Later, Apache ZooKeeper became a standard for organized service
used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses
ZooKeeper to track the status of distributed data. This tutorial explains the basics of ZooKeeper,
how to install and deploy a ZooKeeper cluster in a distributed environment, and finally concludes
with a few examples using Java programming and sample applications.
Audience
This tutorial has been prepared for professionals aspiring to make a career in Big Data Analytics
using ZooKeeper framework. It will give you enough understanding on how to use ZooKeeper to
create distributed clusters.
Prerequisites
Before proceeding with this tutorial, you must have a good understanding of Java because the
ZooKeeper server runs on JVM, distributed process, and Linux environment.
All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt.
Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any
contents or a part of contents of this e-book in any manner without written consent of the
publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd.
provides no guarantee regarding the accuracy, timeliness or completeness of our website or its
contents including this tutorial. If you discover any errors on our website or in this tutorial,
please notify us at [email protected]
i
ZooKeeper
Table of Contents
About the Tutorial ..................................................................................................................................... i
Audience.................................................................................................................................................... i
Prerequisites ............................................................................................................................................. i
Sessions ................................................................................................................................................... 7
Watches .................................................................................................................................................... 7
ii
ZooKeeper
Watch ...................................................................................................................................................... 18
Yahoo!..................................................................................................................................................... 38
iii
1. ZOOKEEPER – OVERVIEW ZooKeeper
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and
managing a service in a distributed environment is a complicated process. ZooKeeper solves this
issue with its simple architecture and API. ZooKeeper allows developers to focus on core
application logic without worrying about the distributed nature of the application.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an
easy and robust manner. Later, Apache ZooKeeper became a standard for organized service
used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses
ZooKeeper to track the status of distributed data.
Before moving further, it is important that we know a thing or two about distributed applications.
So, let us start the discussion with a quick overview of distributed applications.
Distributed Application
A distributed application can run on multiple systems in a network at a given time
(simultaneously) by coordinating among themselves to complete a particular task in a fast and
efficient manner. Normally, complex and time-consuming tasks, which will take hours to
complete by a non-distributed application (running in a single system) can be done in minutes
by a distributed application by using computing capabilities of all the system involved.
The time to complete the task can be further reduced by configuring the distributed application
to run on more systems. A group of systems in which a distributed application is running is called
a Cluster and each machine running in a cluster is called a Node.
A distributed application has two parts, Server and Client application. Server applications are
actually distributed and have a common interface so that clients can connect to any server in
1
ZooKeeper
the cluster and get the same result. Client applications are the tools to interact with a distributed
application.
Transparency – Hides the complexity of the system and shows itself as a single entity
/ application.
Deadlock – Two or more operations waiting for each other to complete indefinitely.
Naming service – Identifying the nodes in a cluster by name. It is similar to DNS, but
for nodes.
Cluster management – Joining / leaving of a node in a cluster and node status at real
time.
Locking and synchronization service – Locking the data while modifying it. This
mechanism helps you in automatic fail recovery while connecting other distributed
applications like Apache HBase.
Highly reliable data registry – Availability of data even when one or a few nodes are
down.
Distributed applications offer a lot of benefits, but they throw a few complex and hard-to-crack
challenges as well. ZooKeeper framework provides a complete mechanism to overcome all the
challenges. Race condition and deadlock are handled using fail-safe synchronization
approach. Another main drawback is inconsistency of data, which ZooKeeper resolves with
atomicity.
Benefits of ZooKeeper
Here are the benefits of using ZooKeeper:
Ordered Messages
Serialization – Encode the data according to specific rules. Ensure your application runs
consistently. This approach can be used in MapReduce to coordinate queue to execute
running threads.
Reliability
Atomicity – Data transfer either succeed or fail completely, but no transaction is partial.
3
2. ZOOKEEPER – FUNDAMENTALS ZooKeeper
Before going deep into the working of ZooKeeper, let us take a look at the fundamental concepts
of ZooKeeper. We will discuss the following topics in this chapter:
Architecture
Hierarchical namespace
Session
Watches
Architecture of ZooKeeper
Take a look at the following diagram. It depicts the “Client-Server Architecture” of ZooKeeper.
4
ZooKeeper
Each one of the components that is a part of the ZooKeeper architecture has been explained in
the following table.
Part Description
Server, one of the nodes in our ZooKeeper ensemble, provides all the
Server services to clients. Gives acknowledgement to client to inform that the
server is alive.
Hierarchical Namespace
The following diagram depicts the tree structure of ZooKeeper file system used for memory
representation. ZooKeeper node is referred as znode. Every znode is identified by a name and
separated by a sequence of path (/).
In the diagram, first you have a root znode separated by “/”. Under root, you have two
logical namespaces config and workers.
The config namespace is used for centralized configuration management and the
workers namespace is used for naming.
Under config namespace, each znode can store upto 1MB of data. This is similar to UNIX
file system except that the parent znode can store data as well. The main purpose of this
structure is to store synchronized data and describe the metadata of the znode. This
structure is called as ZooKeeper Data Model.
5
ZooKeeper
Every znode in the ZooKeeper data model maintains a stat structure. A stat simply provides
the metadata of a znode. It consists of Version number, Action control list (ACL), Timestamp,
and Data length.
Version number: Every znode has a version number, which means every time the data
associated with the znode changes, its corresponding version number would also
increased. The use of version number is important when multiple zookeeper clients are
trying to perform operations over the same znode.
Action Control List (ACL): ACL is basically an authentication mechanism for accessing
the znode. It governs all the znode read and write operations.
Timestamp: Timestamp represents time elapsed from znode creation and modification.
It is usually represented in milliseconds. ZooKeeper identifies every change to the znodes
from “Transaction ID” (zxid). Zxid is unique and maintains time for each transaction so
that you can easily identify the time elapsed from one request to another request.
Data length: Total amount of the data stored in a znode is the data length. You can
store a maximum of 1MB of data.
6
ZooKeeper
Types of Znodes
Znodes are categorized as persistence, sequential, and ephemeral.
Persistence znode: Persistence znode is alive even after the client, which created that
particular znode, is disconnected. By default, all znodes are persistent unless otherwise
specified.
Ephemeral znode: Ephemeral znodes are active until the client is alive. When a client
gets disconnected from the ZooKeeper ensemble, then the ephemeral znodes get deleted
automatically. For this reason, only ephemeral znodes are not allowed to have a children
further. If an ephemeral znode is deleted, then the next suitable node will fill its position.
Ephemeral znodes play an important role in Leader election.
Sessions
Sessions are very important for the operation of ZooKeeper. Requests in a session are executed
in FIFO order. Once a client connects to a server, the session will be established and a session
id is assigned to the client.
The client sends heartbeats at a particular time interval to keep the session valid. If the
ZooKeeper ensemble does not receive heartbeats from a client for more than the period (session
timeout) specified at the starting of the service, it decides that the client died.
Session timeouts are usually represented in milliseconds. When a session ends for any reason,
the ephemeral znodes created during that session also get deleted.
Watches
Watches are a simple mechanism for the client to get notifications about the changes in the
ZooKeeper ensemble. Clients can set watches while reading a particular znode. Watches send a
notification to the registered client for any of the znode (on which client registers) changes.
Znode changes are modification of data associated with the znode or changes in the znode’s
children. Watches are triggered only once. If a client wants a notification again, it must be done
through another read operation. When a connection session is expired, the client will be
disconnected from the server and the associated watches are also removed.
7
3. ZOOKEEPER – WORKFLOW ZooKeeper
Once a ZooKeeper ensemble starts, it will wait for the clients to connect. Clients will connect to
one of the nodes in the ZooKeeper ensemble. It may be a leader or a follower node. Once a
client is connected, the node assigns a session ID to the particular client and sends an
acknowledgement to the client. If the client does not get an acknowledgment, it simply tries to
connect another node in the ZooKeeper ensemble. Once connected to a node, the client will send
heartbeats to the node in a regular interval to make sure that the connection is not lost.
If a client wants to read a particular znode, it sends a read request to the node
with the znode path and the node returns the requested znode by getting it from its own
database. For this reason, reads are fast in ZooKeeper ensemble.
If a client wants to store data in the ZooKeeper ensemble, it sends the znode path
and the data to the server. The connected server will forward the request to the leader
and then the leader will reissue the writing request to all the followers. If only a majority
of the nodes respond successfully, then the write request will succeed and a successful
return code will be sent to the client. Otherwise, the write request will fail. The strict
majority of nodes is called as Quorum.
If we have a single node, then the ZooKeeper ensemble fails when that node fails. It
contributes to “Single Point of Failure” and it is not recommended in a production
environment.
If we have two nodes and one node fails, we don’t have majority as well, since one out
of two is not a majority.
If we have three nodes and one node fails, we have majority and so, it is the minimum
requirement. It is mandatory for a ZooKeeper ensemble to have at least three nodes in
a live production environment.
If we have four nodes and two nodes fail, it fails again and it is similar to having three
nodes. The extra node does not serve any purpose and so, it is better to add nodes in
odd numbers, e.g., 3, 5, 7.
We know that a write process is expensive than a read process in ZooKeeper ensemble, since
all the nodes need to write the same data in its database. So, it is better to have less number
of nodes (3, 5 or 7) than having a large number of nodes for a balanced environment.
The following diagram depicts the ZooKeeper WorkFlow and the subsequent table explains its
different components.
8
ZooKeeper
Component Description
Write process is handled by the leader node. The leader forwards the write
Write request to all the znodes and waits for answers from the znodes. If half of
the znodes reply, then the write process is complete.
Replicated It is used to store data in zookeeper. Each znode has its own database and
Database every znode has the same data at every time with the help of consistency.
Leader Leader is the Znode that is responsible for processing write requests.
Followers receive write requests from the clients and forward them to the
Follower
leader znode.
Request Processor Present only in leader node. It governs write requests from the follower node.
Responsible for broadcasting the changes from the leader node to the
Atomic broadcasts
follower nodes.
9
4. ZOOKEEPER – LEADER ELECTION ZooKeeper
Let us analyze how a leader node can be elected in a ZooKeeper ensemble. Consider there are
N number of nodes in a cluster. The process of leader election is as follows:
1. All the nodes create a sequential, ephemeral znode with the same path,
/app/leader_election/guid_.
2. ZooKeeper ensemble will append the 10-digit sequence number to the path and the znode
created will be /app/leader_election/guid_0000000001,
/app/leader_election/guid_0000000002, etc.
3. For a given instance, the node which creates the smallest number in the znode becomes
the leader and all the other nodes are followers.
4. Each follower node watches the znode having the next smallest number. For example,
the node which creates znode /app/leader_election/guid_0000000008 will watch
the znode /app/leader_election/guid_0000000007 and the node which creates the
znode /app/leader_election/guid_0000000007 will watch the znode
/app/leader_election/guid_0000000006.
5. If the leader goes down, then its corresponding znode /app/leader_electionN gets
deleted.
6. The next in line follower node will get the notification through watcher about the leader
removal.
7. The next in line follower node will check if there are other znodes with the smallest
number. If none, then it will assume the role of the leader. Otherwise, it finds the node
which created the znode with the smallest number as leader.
8. Similarly, all other follower nodes elect the node which created the znode with the
smallest number as leader.
Leader election is a complex process when it is done from scratch. But ZooKeeper service makes
it very simple. Let us move on to the installation of ZooKeeper for development purpose in the
next chapter.
10
5. ZOOKEEPER – INSTALLATION ZooKeeper
Before installing ZooKeeper, make sure your system is running on any of the following operating
systems:
ZooKeeper server is created in Java and it runs on JVM. You need to use JDK 6 or greater.
Now, follow the steps given below to install ZooKeeper framework on your machine.
$ java -version
If you have Java installed on your machine, then you could see the version of installed Java.
Otherwise, follow the simple steps given below to install the latest version of Java.
http://www.oracle.com/technetwork/java/javase/downloads/index.html
The latest version (while writing this tutorial) is JDK 8u 60 and the file is “jdk-8u60-linux-
x64.tar.gz”. Please download the file on your machine.
$ cd /go/to/download/path
$ tar -zxf jdk-8u60-linux-x64.gz
$ su
11
ZooKeeper
Now, apply all the changes into the current running system.
$ source ~/.bashrc
Step 1.6
Verify the Java installation using the verification command (java -version) explained in Step 1.
$ cd opt/
$ tar -zxf zookeeper-3.4.6.tar.gz
$ cd zookeeper-3.4.6
$ mkdir data
12
ZooKeeper
$ vi conf/zoo.cfg
tickTime=2000
dataDir=/path/to/zookeeper/data
clientPort=2181
initLimit=5
syncLimit=2
Once the configuration file has been saved successfully, return to the terminal again. You can
now start the zookeeper server.
$ bin/zkServer.sh start
$ bin/zkCli.sh
After typing the above command, you will be connected to the ZooKeeper server and you should
get the following response.
Connecting to localhost:2181
................
................
................
Welcome to ZooKeeper!
................
................
13
ZooKeeper
WATCHER::
$ bin/zkServer.sh stop
14
6. ZOOKEEPER – CLI ZooKeeper
ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper ensemble for
development purpose. It is useful for debugging and working around with different options.
To perform ZooKeeper CLI operations, first turn on your ZooKeeper server (“bin/zkServer.sh
start”) and then, ZooKeeper client (“bin/zkCli.sh”). Once the client starts, you can perform the
following operation:
Create znodes
Get data
Watch znode for changes
Set data
Create children of a znode
List children of a znode
Check Status
Remove / Delete a znode
Create Znodes
Create a znode with the given path. The flag argument specifies whether the created znode will
be ephemeral, persistent, or sequential. By default, all znodes are persistent.
ZooKeeper ensemble will add sequence number along with 10 digit padding to the znode
path. For example, the znode path /myapp will be converted to /myapp0000000001 and
the next sequence number will be /myapp0000000002. If no flags are specified, then the
znode is considered as persistent.
Syntax
create /path /data
Sample
create /FirstZnode “Myfirstzookeeper-app”
15
ZooKeeper
Output
[zk: localhost:2181(CONNECTED) 0] create /FirstZnode “Myfirstzookeeper-app”
Created /FirstZnode
Syntax
create -s /path /data
Sample
create -s /FirstZnode second-data
Output
[zk: localhost:2181(CONNECTED) 2] create -s /FirstZnode “second-data”
Created /FirstZnode0000000023
Syntax
create -e /path /data
Sample
create -e /SecondZnode “Ephemeral-data”
Output
[zk: localhost:2181(CONNECTED) 2] create -e /SecondZnode “Ephemeral-data”
Created /SecondZnode
Remember when a client connection is lost, the ephemeral znode will be deleted. You can try it
by quitting the ZooKeeper CLI and then re-opening the CLI.
Get Data
It returns the associated data of the znode and metadata of the specified znode. You will get
information such as when the data was last modified, where it was modified, and information
about the data. This CLI is also used to assign watches to show notification about the data.
16
ZooKeeper
Syntax
get /path
Sample
get /FirstZnode
Output
[zk: localhost:2181(CONNECTED) 1] get /FirstZnode
“Myfirstzookeeper-app”
cZxid = 0x7f
ctime = Tue Sep 29 16:15:47 IST 2015
mZxid = 0x7f
mtime = Tue Sep 29 16:15:47 IST 2015
pZxid = 0x7f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 22
numChildren = 0
To access a sequential znode, you must enter the full path of the znode.
Sample
get /FirstZnode0000000023
Output
[zk: localhost:2181(CONNECTED) 1] get /FirstZnode0000000023
“Second-data”
cZxid = 0x80
ctime = Tue Sep 29 16:25:47 IST 2015
mZxid = 0x80
mtime = Tue Sep 29 16:25:47 IST 2015
pZxid = 0x80
17
ZooKeeper
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 13
numChildren = 0
Watch
Watches show a notification when the specified znode or znode’s children data changes. You can
set a watch only in get command.
Syntax
get /path [watch] 1
Sample
get /FirstZnode 1
Output
[zk: localhost:2181(CONNECTED) 1] get /FirstZnode 1
“Myfirstzookeeper-app”
cZxid = 0x7f
ctime = Tue Sep 29 16:15:47 IST 2015
mZxid = 0x7f
mtime = Tue Sep 29 16:15:47 IST 2015
pZxid = 0x7f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 22
numChildren = 0
The output is similar to normal get command, but it will wait for znode changes in the
background. <Start here>
18
ZooKeeper
Set Data
Set the data of the specified znode. Once you finish this set operation, you can check the data
using the get CLI command.
Syntax
set /path /data
Sample
set /SecondZnode Data-updated
Output
[zk: localhost:2181(CONNECTED) 1] get /SecondZnode “Data-updated”
cZxid = 0x82
ctime = Tue Sep 29 16:29:50 IST 2015
mZxid = 0x83
mtime = Tue Sep 29 16:29:50 IST 2015
pZxid = 0x82
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x15018b47db00000
dataLength = 14
numChildren = 0
If you assigned watch option in get command (as in previous command), then the output will
be similar as shown below:
Output
[zk: localhost:2181(CONNECTED) 1] get /FirstZnode “Mysecondzookeeper-app”
WATCHER: :
19
ZooKeeper
mZxid = 0x84
mtime = Tue Sep 29 17:14:47 IST 2015
pZxid = 0x7f
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 23
numChildren = 0
Syntax
create /parent/path/subnode/path /data
Sample
create /FirstZnode/Child1 firstchildren
Output
[zk: localhost:2181(CONNECTED) 16] create /FirstZnode/Child1 “firstchildren”
created /FirstZnode/Child1
[zk: localhost:2181(CONNECTED) 17] create /FirstZnode/Child2 “secondchildren”
created /FirstZnode/Child2
List Children
This command is used to list and display the children of a znode.
Syntax
ls /path
Sample
ls /MyFirstZnode
20
ZooKeeper
Output
[zk: localhost:2181(CONNECTED) 2] ls /MyFirstZnode
[mysecondsubnode, myfirstsubnode]
Check Status
Status describes the metadata of a specified znode. It contains details such as Timestamp,
Version number, ACL, Data length, and Children znode.
Syntax
stat /path
Sample
stat /FirstZnode
Output
[zk: localhost:2181(CONNECTED) 1] stat /FirstZnode
cZxid = 0x7f
ctime = Tue Sep 29 16:15:47 IST 2015
mZxid = 0x7f
mtime = Tue Sep 29 17:14:24 IST 2015
pZxid = 0x7f
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 23
numChildren = 0
21
ZooKeeper
Remove a Znode
Removes a specified znode and recursively all its children. This would happen only if such a
znode is available.
Syntax
rmr /path
Sample
rmr /FirstZnode
Output
[zk: localhost:2181(CONNECTED) 10] rmr /FirstZnode
[zk: localhost:2181(CONNECTED) 11] get /FirstZnode
Node does not exist: /FirstZnode
Delete (delete /path) command is similar to remove command, except the fact that it works
only on znodes with no children.
22
7. ZOOKEEPER – API ZooKeeper
ZooKeeper has an official API binding for Java and C. The ZooKeeper community provides
unofficial API for most of the languages (.NET, python, etc.). Using ZooKeeper API, an application
can connect, interact, manipulate data, coordinate, and finally disconnect from a ZooKeeper
ensemble.
ZooKeeper API has a rich set of features to get all the functionality of the ZooKeeper ensemble
in a simple and safe manner. ZooKeeper API provides both synchronous and asynchronous
methods.
ZooKeeper ensemble and ZooKeeper API completely complement each other in every aspect and
it benefits the developers in a great way. Let us discuss Java binding in this chapter.
Znode is the core component of ZooKeeper ensemble and ZooKeeper API provides a small set
of methods to manipulate all the details of znode with ZooKeeper ensemble.
A client should follow the steps given below to have a clear and clean interaction with ZooKeeper
ensemble.
Connect to the ZooKeeper ensemble. ZooKeeper ensemble assign a Session ID for the
client.
Send heartbeats to the server periodically. Otherwise, the ZooKeeper ensemble expires
the Session ID and the client needs to reconnect.
Disconnect from the ZooKeeper ensemble, once all the tasks are completed. If the client
is inactive for a prolonged time, then the ZooKeeper ensemble will automatically
disconnect the client.
Java Binding
Let us understand the most important set of ZooKeeper API in this chapter. The central part of
the ZooKeeper API is ZooKeeper class. It provides options to connect the ZooKeeper ensemble
in its constructor and has the following methods:
Where,
Let us create a new helper class ZooKeeperConnection and add a method connect. The
connect method creates a ZooKeeper object, connects to the ZooKeeper ensemble, and then
returns the object.
Here CountDownLatch is used to stop (wait) the main process until the client connects with
the ZooKeeper ensemble.
The ZooKeeper ensemble replies the connection status through the Watcher callback. The
Watcher callback will be called once the client connects with the ZooKeeper ensemble and the
Watcher callback calls the countDown method of the CountDownLatch to release the lock,
await in the main process.
Coding: ZooKeeperConnection.java
// import java classes
import java.io.IOException;
import java.util.concurrent.CountDownLatch;
24
ZooKeeper
import org.apache.zookeeper.AsyncCallback.StatCallback;
import org.apache.zookeeper.KeeperException.Code;
import org.apache.zookeeper.data.Stat;
connectedSignal.await();
return zoo;
}
Save the above code and it will be used in the next section for connecting the ZooKeeper
ensemble.
25
ZooKeeper
Create a Znode
The ZooKeeper class provides create method to create a new znode in the ZooKeeper
ensemble. The signature of the create method is as follows:
Where,
acl – access control list of the node to be created. ZooKeeper API provides a static
interface ZooDefs.Ids to get some of basic acl list. For example,
ZooDefs.Ids.OPEN_ACL_UNSAFE returns a list of acl for open znodes.
createMode – the type of node, either ephemeral, sequential, or both. This is an enum.
Let us create a new Java application to check the create functionality of the ZooKeeper API.
Create a file ZKCreate.java. In the main method, create an object of type
ZooKeeperConnection and call the connect method to connect to the ZooKeeper ensemble.
The connect method will return the ZooKeeper object zk. Now, call the create method of zk
object with custom path and data.
Coding: ZKCreate.java
import java.io.IOException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.ZooDefs;
26
ZooKeeper
try {
conn = new ZooKeeperConnection();
zk = conn.connect("localhost");
conn.close();
} catch (Exception e) {
System.out.println(e.getMessage()); //Catch error message
}
}
}
Once the application is compiled and executed, a znode with the specified data will be created
in the ZooKeeper ensemble. You can check it using the ZooKeeper CLI zkCli.sh.
cd /path/to/zookeeper
27
ZooKeeper
bin/zkCli.sh
>>> get /MyFirstZnode
Where,
Let us create a new Java application to check the “exists” functionality of the ZooKeeper API.
Create a file “ZKExists.java”. In the main method, create ZooKeeper object, “zk” using
“ZooKeeperConnection” object. Then, call “exists” method of “zk” object with custom “path”.
The complete listing is as follow
Coding: ZKExists.java
import java.io.IOException;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import org.apache.zookeeper.data.Stat;
28
ZooKeeper
try {
conn = new ZooKeeperConnection();
zk = conn.connect("localhost");
if(stat!= null) {
System.out.println("Node exists and the node version is " +
stat.getVersion());
} else {
System.out.println("Node does not exists");
}
}
catch(Exception e) {
System.out.println(e.getMessage()); // Catches error messages
}
}
}
Once the application is compiled and executed, you will get the below output.
getData Method
The ZooKeeper class provides getData method to get the data attached in a specified znode
and its status. The signature of the getData method is as follows:
Where,
29
ZooKeeper
watcher – Callback function of type Watcher. The ZooKeeper ensemble will notify
through the Watcher callback when the data of the specified znode changes. This is one-
time notification.
Let us create a new Java application to understand the getData functionality of the ZooKeeper
API. Create a file ZKGetData.java. In the main method, create a ZooKeeper object zk using
the ZooKeeperConnection object. Then, call the getData method of zk object with custom
path.
Here is the complete program code to get the data from a specified node:
Coding: ZKGetData.java
import java.io.IOException;
import java.util.concurrent.CountDownLatch;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import org.apache.zookeeper.data.Stat;
30
ZooKeeper
try {
conn = new ZooKeeperConnection();
zk = conn.connect("localhost");
break;
}
} else {
String path = "/MyFirstZnode";
try {
byte[] bn = zk.getData(path,
false, null);
String data = new String(bn,
"UTF-8");
System.out.println(data);
connectedSignal.countDown();
} catch(Exception ex) {
System.out.println(ex.getMessage());
}
}
}
}, null);
String data = new String(b, "UTF-8");
System.out.println(data);
connectedSignal.await();
31
ZooKeeper
} else {
System.out.println("Node does not exists");
}
}
catch(Exception e) {
System.out.println(e.getMessage());
}
}
}
Once the application is compiled and executed, you will get the following output.
And the application will wait for further notification from the ZooKeeper ensemble. Change the
data of the specified znode using ZooKeeper CLI zkCli.sh.
cd /path/to/zookeeper
bin/zkCli.sh
>>> set /MyFirstZnode Hello
Now, the application will print the following output and exit.
Hello
setData Method
The ZooKeeper class provides setData method to modify the data attached in a specified znode.
The signature of the setData method is as follows:
Where,
version – Current version of the znode. ZooKeeper updates the version number of the
znode whenever the data gets changed.
32
ZooKeeper
Let us now create a new Java application to understand the setData functionality of the
ZooKeeper API. Create a file ZKSetData.java. In the main method, create a ZooKeeper object
zk using the ZooKeeperConnection object. Then, call the setData method of zk object with
the specified path, new data, and version of the node.
Here is the complete program code to modify the data attached in a specified znode.
Code: ZKSetData.java
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import java.io.IOException;
// Method to update the data in a znode. Similar to getData but without watcher.
public static void update(String path, byte[] data) throws
KeeperException,InterruptedException {
zk.setData(path, data, zk.exists(path,true).getVersion());
}
try {
33
ZooKeeper
}
catch(Exception e) {
System.out.println(e.getMessage());
}
}
}
Once the application is compiled and executed, the data of the specified znode will be changed
and it can be checked using the ZooKeeper CLI, zkCli.sh.
cd /path/to/zookeeper
bin/zkCli.sh
>>> get /MyFirstZnode
getChildren Method
The ZooKeeper class provides getChildren method to get all the sub-node of a particular znode.
The signature of the getChildren method is as follows:
Where,
watcher – Callback function of type “Watcher”. The ZooKeeper ensemble will notify when
the specified znode gets deleted or a child under the znode gets created / deleted. This
is a one-time notification.
Coding: ZKGetChildren.java
import java.io.IOException;
import java.util.*;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;
import org.apache.zookeeper.data.Stat;
34
ZooKeeper
try {
conn = new ZooKeeperConnection();
zk = conn.connect("localhost");
if(stat!= null) {
35
ZooKeeper
}
}
Before running the program, let us create two sub-nodes for /MyFirstZnode using the
ZooKeeper CLI, zkCli.sh.
cd /path/to/zookeeper
bin/zkCli.sh
>>> create /MyFirstZnode/myfirstsubnode Hi
>>> create /MyFirstZnode/mysecondsubmode Hi
Now, compiling and running the program will output the above created znodes.
myfirstsubnode
mysecondsubnode
Delete a Znode
The ZooKeeper class provides delete method to delete a specified znode. The signature of the
delete method is as follows:
Where,
Let us create a new Java application to understand the delete functionality of the ZooKeeper
API. Create a file ZKDelete.java. In the main method, create a ZooKeeper object zk using
ZooKeeperConnection object. Then, call the delete method of zk object with the specified
path and version of the node.
Coding: ZKDelete.java
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.KeeperException;
36
ZooKeeper
try{
conn = new ZooKeeperConnection();
zk = conn.connect("localhost");
37
8. ZOOKEEPER – APPLICATIONS ZooKeeper
Yahoo!
The ZooKeeper framework was originally built at “Yahoo!”. A well-designed distributed
application needs to meet requirements such as data transparency, better performance,
robustness, centralized configuration, and coordination. So, they designed the ZooKeeper
framework to meet these requirements.
Apache Hadoop
Apache Hadoop is the driving force behind the growth of Big Data industry. Hadoop relies on
ZooKeeper for configuration management and coordination. Let us take a scenario to understand
the role of ZooKeeper in Hadoop.
Assume that a Hadoop cluster bridges 100 or more commodity servers. Therefore, there’s
a need for coordination and naming services. As computation of large number of nodes are
involved, each node needs to synchronize with each other, know where to access services, and
know how they should be configured. At this point of time, Hadoop clusters require cross-node
services. ZooKeeper provides the facilities for cross-node synchronization and ensures the
tasks across Hadoop projects are serialized and synchronized.
Multiple ZooKeeper servers support large Hadoop clusters. Each client machine communicates
with one of the ZooKeeper servers to retrieve and update its synchronization information. Some
of the real-time examples are:
Human Genome Project – The Human Genome Project contains terabytes of data.
Hadoop MapReduce framework can be used to analyze the dataset and find interesting
facts for human development.
Healthcare – Hospitals can store, retrieve, and analyze huge sets of patient medical
records, which are normally in terabytes.
Apache HBase
Apache HBase is an open source, distributed, NoSQL database used for real-time read/write
access of large datasets and runs on top of the HDFS. HBase follows master-slave architecture
where the HBase Master governs all the slaves. Slaves are referred as Region servers.
38
ZooKeeper
Telecom – Telecom industry stores billions of mobile call records (around 30TB / month)
and accessing these call records in real time become a huge task. HBase can be used to
process all the records in real time, easily and efficiently.
Social network – Similar to telecom industry, sites like Twitter, LinkedIn, and Facebook
receive huge volumes of data through the posts created by users. HBase can be used to
find recent trends and other interesting facts.
Apache Solr
Apache Solr is a fast, open source search platform written in Java. It is a blazing fast, fault-
tolerant distributed search engine. Built on top of Lucene, it is a high-performance, full-featured
text search engine.
Solr extensively uses every feature of ZooKeeper such as Configuration management, Leader
election, node management, Locking and syncronization of data.
Solr has two distinct parts, indexing and searching. Indexing is a process of storing the data
in a proper format so that it can be searched later. Solr uses ZooKeeper for both indexing the
data in multiple nodes and searching from multiple nodes. ZooKeeper contributes the following
features:
Sharing of data between multiple nodes and subsequently searching from multiple nodes
for faster search results
Some of the use-cases of Apache Solr include e-commerce, job search, etc.
39