SlideShare a Scribd company logo
Zookeeper Architecture
Coordination service:
A coordination service in a distributed world helps distributed applications to
offload some common challenges like
- Synchronization b/w the nodes of the cluster.
- Distributing a common configuration b/w the nodes of the cluster
- Grouping and naming services for each of the nodes of the cluster
- Leader election b/w the nodes.
Znodes:
Zookeeper(Zk) helps the nodes of the distributed applications coordinate with
each other by providing a common namespace.
Nodes can use this namespace to save and retrieve any shared info to help
coordination.
The namespace is hierarchical much like a tree d/s.
Each element in this namespace is called a znode associated with a name
separated by a (/) to indicate its hierarchical path from the root.
These namespace is stored in-memory and therefore provides faster access.
Ensemble:
Similar to the distributed application clients that it servers, zookeeper itself is
distributed i.e, a set of zookeeper nodes work together to achieve its goal. This
group of zookeeper nodes is called an Ensemble.
Clients can talk to any node within the ensemble (via zk client lib). Clients
periodically send heartbeats to server & receive an ack back to reaffirm its
connectivity.
Each node in the ensemble is aware of and talks to other nodes to share info.
Znodes - namespace
What is Zookeeper ?
Ensemble
Zookeeper Data Model
Each Znode stores a stat structure that contains zxid (transactionID),
version #, timestamp, ACL. Client receives this stats structure at the time
of read. The stat structure helps to validate the updates/deletes from
client.
With every creation/updates the stat structure is updated. Version #
increments, Zxid increments, reset timestamp etc..
Types of ZNodes:
● Ephemeral:
○ Exists as long as the session that created them also
exist.
○ Cannot have children.
● Persistent:
○ Unlike ephemeral, these are persisted across sessions.
● Sequential:
○ Contains a monotonically increasing no as a part of its
name. Helps keep its uniqueness.
Zookeeper Sessions
- Client lib configuration contains the list of all zookeeper servers.
- Client establishes connection to any random server from the list.
- The connected server sends an auth token upon successfully
connection.
- Both client and connected server periodically exchange
heartbeats to confirm that they are each alive.
- If the client loses connectivity, the client lib upon timeout will
connect to some other server from the config list. This switch is
transparent to the client application.
- During reconnection the auth token from the prev connection is
used for validity to attempt an connection to its lost session.
Zookeeper Watches
- Client ops like getData(), getChildren(), exits() etc.., has an
optional parameter to enable a watch on the target znode.
- Zookeeper servers notifies a single change event to the
watchers of the znode. Successive changes to the znode will
not be notify the watchers.
- There are 2 kinds of Watches
- Data watches: Watches for a change in data on a
znode.
getData(), exists() are set to watch for a change in
data.
Also create(), delete()
- Children watches: Watches for a add/deletion of a
child node for a parent znode.
getChildren() is set to watch for add/dele for child
znodes for a parent znode.
Also create(), delete()
- Server A & B creates ephemeral nodes 1 & 2
respectively.
- When A dies, B that’s watching 1 is notified
before 1 expires.
- B can now leverage the info to take evasive
action.
Read Path:
- In a Leader/Follower model, reads are eventually consistent.
- Client connects to one of the zk servers and request for znode along a
path.
- The connected zk server authenticates the client & servers the read from
its locally stored namespace.
- Since its a local copy, it can be stale.
- Zk servers choose availability over consistency hence each servers stores
its own copy of the namespace.
Zookeeper Data Access
Write Path:
- Client connects to one of the zk servers and requests a create/delete of a
znode along a path in the namespace.
- Since all writes are handled by the leader, the connected zk server
forwards the write to leader.
- The leader persists the data and broadcasts the write to all followers in the
cluster & awaits their response.
- If majority of them writes into their local namespace and responds back,
we then have a quorum & write is a success.
- The initial connected zk server responds the write request as a success to
the client.
Zk commands
Intent: Enforce a barricade while performing crucial
job.
● Client calls exits(/b, true), to check if barrier
exists and sets a watch.
● If barrier /b doesn’t exist, create a Ephemeral
node and proceed with the client job
● create(/b, EPHEMERAL)
● If barrier /b exists, client waits for the watch
trigger. At this point the there may be multiple
clients that are on wait n watch for the same
barrier /b.
● One the client job is done it can delete the
barrier.
● The delete of barrier node triggers notification
to all watchers.
● Other waiting clients can now retry with calls
to exits(/b, true).
Usage: Critical updates/housekeeping tasks to force
wait on other processes.
Recipe - Barrier
delete(/b,
Ephemeral)
Is
exists
(/b,
true)
Create(/b,
Ephemeral)
Run client
job
ClientClientClient
exists(/b, true)
Yes
No
Notify state
change to
watchers
Create & delete are atomic ops performed by leader upon agreement with quorum.
Leader guarantees order in the event of race condition for multiple creation requests
from different clients are sequential.
Notify state change
to watchers
Recipe - Cluster Management
Intent: Notify nodes about the arrival or departure of other nodes in the
cluster.
● Create a PERSISTENT parent node /member
● Each client sets a watch on the parent node /member
exists(/member, true)
● Each client creates EPHEMERAL child node under /member
create(/member/host1, false)
● Each client updates its status like CPU/memory/failure etc to its
node in the hierarchy.
● Watches are triggered to all watchers with a change to any child
node.
Usage: Cluster monitoring or management for elastic scaling.
Client c1
/member
Client c2
Client c3
Watches
parent
creates/
updates /member/c1
/member/c2
/member/c3
Notifies
watchers
When client c3 creates /member/c3, zk notifies the other
watches viz., c1 and c2.
Recipe - Queues
Intent: Creates a ordered data access FIFO
● Create a PERSISTENT parent node /queue
● Each client creates EPHEMERAL & SEQUENTIAL child node
under /queue. Since its sequential it appends a monotonically
increasing no at the end e.g., /queue/X-00001, /queue/X-0002...
create(/queue/X-, false)
● A client that wants to access the nodes in insertion order simply
invokes all its children.
getChildren(/queue, true)
By enabling the watch on the parent, the accessor client is
notified when a child is created or removed externally.
Useage: Cluster monitoring or management for elastic scaling.
Client c1
/queue
Client c2
Client c3
creates
/queue/x-0001
/queue/x-0002
/queue/x-0003
Client c4
getChildren
Watches for
changes to
children
Recipe - Locks
Intent: Avoid race condition by enforcing a lock/key pattern
1. Create a PERSISTENT parent node /lock
2. Each client creates EPHEMERAL & SEQUENTIAL child node
under /lock. Since its sequential it appends a monotonically
increasing no at the end e.g., /lock/X-00001, /queue/X-0002…
create(/lock/X-, false)
1. Locks are granted in the insertion order from smallest to largest.
Client wants to check if its the lowest, invokes
getChildren(/lock, false)
1. If 1st znode in the list of children is its very own, the lock is
acquired. Client proceeds to do its job. Upon completion,
releases the lock by deleting its znode.
delete(/lock/X-00001)
1. Else, waits for its turn by adding a watch of its predecessor
znode. (If its immediate predecessor doesn’t exists look for the
one before and so until you find one).
exists(/lock/X-00000n - 1)
1. When its predecessor znode is deleted/update the client is
notified.
2. When a node receives this event it goes to step 3.
Client c1
/lock
Client c2
Client c3
creates
/lock/x-0001
/lock/x-0002
/lock/x-0003
Watches it
predecessor
getChildren()
Checks for existence of its predecessor.
Also need to check with the parent if its the 1st existent
child, In the event its predecessor dies.
Recipe - Leader selection
Intent: Leader election
1. Create a PERSISTENT parent node /election
2. Each zk servers creates EPHEMERAL & SEQUENTIAL child
node under /election. Since its sequential it appends a
monotonically increasing no at the end e.g., /election/X-00001,
/election/X-0002…
create(/election/X-, false)
1. Each Zk server checks if it’s the smallest among all children
getChildren(/election, false)
1. If yes, it becomes the leader.
2. Else, it sets a watch on the znode just smaller that itself (smallest
and closest predecessor).
exists(/election/X-00000n - 1)
1. If the leader dies, so does it ephemeral znode triggering a watch
event to only its successor (next in line that watching it).
2. When a node receives this event it goes to step 3.
Zk 1
(Leader)
/election
Zk 2
Zk 3
creates
/election/x-0001
/election/x-0002
/election/x-0003
Watches it
predecessor
getChildren()
Checks for existence of its predecessor.
Also need to check with the parent if its the 1st existent
child, In the event its predecessor dies.
● Brief Architecture
https://data-flair.training/blogs/zookeeper-architecture/
● Datamodel
https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel
● Zk API
https://www.tutorialspoint.com/zookeeper/zookeeper_api.htm
● Overview
https://www.slideshare.net/scottleber/apache-zookeeper
References
Ad

More Related Content

What's hot (20)

8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
confluent
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
João Paulo Leonidas Fernandes Dias da Silva
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Dynamodb ppt
Dynamodb pptDynamodb ppt
Dynamodb ppt
Shellychoudhary1
 
Netapp Storage
Netapp StorageNetapp Storage
Netapp Storage
Prime Infoserv
 
Clustering and High Availability
Clustering and High Availability Clustering and High Availability
Clustering and High Availability
Information Technology
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
Eui Heo
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
Suneel Marthi
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
ScyllaDB
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
Whiteklay
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
confluent
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Flink on Kubernetes operator
Flink on Kubernetes operatorFlink on Kubernetes operator
Flink on Kubernetes operator
Eui Heo
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
ShapeBlue
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
Suneel Marthi
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
ScyllaDB
 

Similar to Zookeeper Architecture (20)

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
Omid Vahdaty
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
Joydeep Banik Roy
 
SVCC-2014
SVCC-2014SVCC-2014
SVCC-2014
John Brinnand
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
John Billings
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
Quach Tung
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
Anh Le
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
Léopold Gault
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
jeetendra mandal
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
Ajith Narayanan
 
RR_07 Maint Monitoring and Tshooting.pptx
RR_07  Maint Monitoring and Tshooting.pptxRR_07  Maint Monitoring and Tshooting.pptx
RR_07 Maint Monitoring and Tshooting.pptx
joomaverick007
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
Abdulaziz AlMalki
 
UNIT IV DIS.pptx
UNIT IV DIS.pptxUNIT IV DIS.pptx
UNIT IV DIS.pptx
Premkumar R
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
Michael Klishin
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
Omid Vahdaty
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
Joydeep Banik Roy
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
John Billings
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
Quach Tung
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
Anh Le
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
Léopold Gault
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
jeetendra mandal
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
Ajith Narayanan
 
RR_07 Maint Monitoring and Tshooting.pptx
RR_07  Maint Monitoring and Tshooting.pptxRR_07  Maint Monitoring and Tshooting.pptx
RR_07 Maint Monitoring and Tshooting.pptx
joomaverick007
 
UNIT IV DIS.pptx
UNIT IV DIS.pptxUNIT IV DIS.pptx
UNIT IV DIS.pptx
Premkumar R
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
Michael Klishin
 
Ad

Recently uploaded (20)

Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Ad

Zookeeper Architecture

  • 2. Coordination service: A coordination service in a distributed world helps distributed applications to offload some common challenges like - Synchronization b/w the nodes of the cluster. - Distributing a common configuration b/w the nodes of the cluster - Grouping and naming services for each of the nodes of the cluster - Leader election b/w the nodes. Znodes: Zookeeper(Zk) helps the nodes of the distributed applications coordinate with each other by providing a common namespace. Nodes can use this namespace to save and retrieve any shared info to help coordination. The namespace is hierarchical much like a tree d/s. Each element in this namespace is called a znode associated with a name separated by a (/) to indicate its hierarchical path from the root. These namespace is stored in-memory and therefore provides faster access. Ensemble: Similar to the distributed application clients that it servers, zookeeper itself is distributed i.e, a set of zookeeper nodes work together to achieve its goal. This group of zookeeper nodes is called an Ensemble. Clients can talk to any node within the ensemble (via zk client lib). Clients periodically send heartbeats to server & receive an ack back to reaffirm its connectivity. Each node in the ensemble is aware of and talks to other nodes to share info. Znodes - namespace What is Zookeeper ? Ensemble
  • 3. Zookeeper Data Model Each Znode stores a stat structure that contains zxid (transactionID), version #, timestamp, ACL. Client receives this stats structure at the time of read. The stat structure helps to validate the updates/deletes from client. With every creation/updates the stat structure is updated. Version # increments, Zxid increments, reset timestamp etc.. Types of ZNodes: ● Ephemeral: ○ Exists as long as the session that created them also exist. ○ Cannot have children. ● Persistent: ○ Unlike ephemeral, these are persisted across sessions. ● Sequential: ○ Contains a monotonically increasing no as a part of its name. Helps keep its uniqueness.
  • 4. Zookeeper Sessions - Client lib configuration contains the list of all zookeeper servers. - Client establishes connection to any random server from the list. - The connected server sends an auth token upon successfully connection. - Both client and connected server periodically exchange heartbeats to confirm that they are each alive. - If the client loses connectivity, the client lib upon timeout will connect to some other server from the config list. This switch is transparent to the client application. - During reconnection the auth token from the prev connection is used for validity to attempt an connection to its lost session. Zookeeper Watches - Client ops like getData(), getChildren(), exits() etc.., has an optional parameter to enable a watch on the target znode. - Zookeeper servers notifies a single change event to the watchers of the znode. Successive changes to the znode will not be notify the watchers. - There are 2 kinds of Watches - Data watches: Watches for a change in data on a znode. getData(), exists() are set to watch for a change in data. Also create(), delete() - Children watches: Watches for a add/deletion of a child node for a parent znode. getChildren() is set to watch for add/dele for child znodes for a parent znode. Also create(), delete() - Server A & B creates ephemeral nodes 1 & 2 respectively. - When A dies, B that’s watching 1 is notified before 1 expires. - B can now leverage the info to take evasive action.
  • 5. Read Path: - In a Leader/Follower model, reads are eventually consistent. - Client connects to one of the zk servers and request for znode along a path. - The connected zk server authenticates the client & servers the read from its locally stored namespace. - Since its a local copy, it can be stale. - Zk servers choose availability over consistency hence each servers stores its own copy of the namespace. Zookeeper Data Access Write Path: - Client connects to one of the zk servers and requests a create/delete of a znode along a path in the namespace. - Since all writes are handled by the leader, the connected zk server forwards the write to leader. - The leader persists the data and broadcasts the write to all followers in the cluster & awaits their response. - If majority of them writes into their local namespace and responds back, we then have a quorum & write is a success. - The initial connected zk server responds the write request as a success to the client.
  • 7. Intent: Enforce a barricade while performing crucial job. ● Client calls exits(/b, true), to check if barrier exists and sets a watch. ● If barrier /b doesn’t exist, create a Ephemeral node and proceed with the client job ● create(/b, EPHEMERAL) ● If barrier /b exists, client waits for the watch trigger. At this point the there may be multiple clients that are on wait n watch for the same barrier /b. ● One the client job is done it can delete the barrier. ● The delete of barrier node triggers notification to all watchers. ● Other waiting clients can now retry with calls to exits(/b, true). Usage: Critical updates/housekeeping tasks to force wait on other processes. Recipe - Barrier delete(/b, Ephemeral) Is exists (/b, true) Create(/b, Ephemeral) Run client job ClientClientClient exists(/b, true) Yes No Notify state change to watchers Create & delete are atomic ops performed by leader upon agreement with quorum. Leader guarantees order in the event of race condition for multiple creation requests from different clients are sequential. Notify state change to watchers
  • 8. Recipe - Cluster Management Intent: Notify nodes about the arrival or departure of other nodes in the cluster. ● Create a PERSISTENT parent node /member ● Each client sets a watch on the parent node /member exists(/member, true) ● Each client creates EPHEMERAL child node under /member create(/member/host1, false) ● Each client updates its status like CPU/memory/failure etc to its node in the hierarchy. ● Watches are triggered to all watchers with a change to any child node. Usage: Cluster monitoring or management for elastic scaling. Client c1 /member Client c2 Client c3 Watches parent creates/ updates /member/c1 /member/c2 /member/c3 Notifies watchers When client c3 creates /member/c3, zk notifies the other watches viz., c1 and c2.
  • 9. Recipe - Queues Intent: Creates a ordered data access FIFO ● Create a PERSISTENT parent node /queue ● Each client creates EPHEMERAL & SEQUENTIAL child node under /queue. Since its sequential it appends a monotonically increasing no at the end e.g., /queue/X-00001, /queue/X-0002... create(/queue/X-, false) ● A client that wants to access the nodes in insertion order simply invokes all its children. getChildren(/queue, true) By enabling the watch on the parent, the accessor client is notified when a child is created or removed externally. Useage: Cluster monitoring or management for elastic scaling. Client c1 /queue Client c2 Client c3 creates /queue/x-0001 /queue/x-0002 /queue/x-0003 Client c4 getChildren Watches for changes to children
  • 10. Recipe - Locks Intent: Avoid race condition by enforcing a lock/key pattern 1. Create a PERSISTENT parent node /lock 2. Each client creates EPHEMERAL & SEQUENTIAL child node under /lock. Since its sequential it appends a monotonically increasing no at the end e.g., /lock/X-00001, /queue/X-0002… create(/lock/X-, false) 1. Locks are granted in the insertion order from smallest to largest. Client wants to check if its the lowest, invokes getChildren(/lock, false) 1. If 1st znode in the list of children is its very own, the lock is acquired. Client proceeds to do its job. Upon completion, releases the lock by deleting its znode. delete(/lock/X-00001) 1. Else, waits for its turn by adding a watch of its predecessor znode. (If its immediate predecessor doesn’t exists look for the one before and so until you find one). exists(/lock/X-00000n - 1) 1. When its predecessor znode is deleted/update the client is notified. 2. When a node receives this event it goes to step 3. Client c1 /lock Client c2 Client c3 creates /lock/x-0001 /lock/x-0002 /lock/x-0003 Watches it predecessor getChildren() Checks for existence of its predecessor. Also need to check with the parent if its the 1st existent child, In the event its predecessor dies.
  • 11. Recipe - Leader selection Intent: Leader election 1. Create a PERSISTENT parent node /election 2. Each zk servers creates EPHEMERAL & SEQUENTIAL child node under /election. Since its sequential it appends a monotonically increasing no at the end e.g., /election/X-00001, /election/X-0002… create(/election/X-, false) 1. Each Zk server checks if it’s the smallest among all children getChildren(/election, false) 1. If yes, it becomes the leader. 2. Else, it sets a watch on the znode just smaller that itself (smallest and closest predecessor). exists(/election/X-00000n - 1) 1. If the leader dies, so does it ephemeral znode triggering a watch event to only its successor (next in line that watching it). 2. When a node receives this event it goes to step 3. Zk 1 (Leader) /election Zk 2 Zk 3 creates /election/x-0001 /election/x-0002 /election/x-0003 Watches it predecessor getChildren() Checks for existence of its predecessor. Also need to check with the parent if its the 1st existent child, In the event its predecessor dies.
  • 12. ● Brief Architecture https://data-flair.training/blogs/zookeeper-architecture/ ● Datamodel https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel ● Zk API https://www.tutorialspoint.com/zookeeper/zookeeper_api.htm ● Overview https://www.slideshare.net/scottleber/apache-zookeeper References