SlideShare a Scribd company logo
Neo4j
   High Availability
  New Auto-Cluster

Michael Hunger - @mesirii
                            1
High Availability Cluster
  ๏Neo4j Enterprise
  ๏Master-Slave Replication
  ๏read-scaling and fault-tolerance
  ๏eventual consistency
    • write to master (push_factor)

    • write to slaves
                                      2
3 Separate Concerns (I)
๏Cluster Management
  •   Members join/leave/heartbeat
๏Failover
  •   Master Election

  • Distribution of Master-Status


                                     3
3 Separate Concerns (II)
๏Replication
  •synchronized id-generation

  • distributed locks

  • pull, push of transactions

  • initial store synchronization


                                    4
Pre 1.9 - Zookeeper


                  5
Pre 1.9
๏Apache Zookeeper took care of concerns
  •   Cluster Management
      ‣new members register with ZK
  •   Failover
      ‣ZK stores Master and last TX-Id
      ‣ZK uses ZAB to determine new Master
       and distribute information
                                         6
HA Cluster

Coordinator              RO-                Coordinator
                         Slave




                       Master

              Slave                 Slave




                      Coordinator



                                                     7
Pre 1.9 - Problems
๏Additional setup and operations of a separate
   component

๏unreliable operation / hiccups
๏longterm stability
๏no dynamic reconfig of the ZK cluster
   important for cloud setup

                                         8
Post 1.9 -
Neo4j Auto Cluster


                 9
Replace Zookeeper!?
๏Implement Multi-Paxos ourselves
๏simple, testable code
๏only covers
  • cluster management,

  • master election


                                   10
HA Cluster




             11
What is Paxos?
๏reliable consensus making
๏broadcasting
๏works even with unreliable communication
  •message lost

  • delays, invalid order
๏does not guarantee progress
                                       12
What is Paxos?




                 13
Implementation
๏everything is a State Machines
  • SM = stateless enums + context

  • Message = type enum + payload

  • State = enum instance

  • switch on msg-type, implement logic
    Transition = handle() messages,


                                          14
Implementation (II)
๏everything is a State Machines
  •   use timeouts for reliability

  • handle failing messages

  • decouple network and time
      ‣for testability
  •   listeners interact on messages with
        outside world, sync or async        15
Implementation (II)
๏Paxos (3 roles)                   Acceptor



  •   Proposer-SM                 Paxos

  • Acceptor-SM
                       Proposer                  Learner




  • Learner-SM                    ClusterState


๏Cluster
  •
                    Heartbeat
      Heartbeat
                                                           16
Multi-Paxos (happy path)
                                                              Acceptor
              Learner              Proposer
                                                              (2 * f + 1)

                         PREPARE


                                                    PREPARE

                                              TIMEOUT

                                                                       VALUE
                                                   PROMISE             MATCH
                                                     OR
                                                   REJECT            NO MATCH



                                                    ACCEPT
                                                                      MATCHES
                                                  TIMEOUT
                                                                      PROMISE?

                            CHECK ,                                  STORE
                            STORE                  ACCEPTED
                                                                     VALUE
                          RESPONSES                   OR
                           IF QUORUM               REJECTED            NO
                          MET, CANCEL
                             TIMEOUT
       STORE




                                    ...
       VALUE               LEARN
      OUT OF
      ORDER
       MSG
     HANDLING
                                         other
      DELIVER       A VALUE IS          Learner
     ALL VALID       MISSING

  ATOMIC BC
                        LEARN TIMEOUT
    WE STILL
                                                                                 17
                        LEARN TIMEOUT
     DON'T
     KNOW
TIMEOUT




Multi-Paxos (happy path)                        PROMISE



                                                ACCEPT




         ...
                                                           MATCHES
                                              TIMEOUT
                                                           PROMISE?

                       CHECK ,                             STORE
                       STORE                    ACCEPTED
                                                           VALUE
                     RESPONSES                        OR
                     IF QUORUM                  REJECTED    NO
                    MET, CANCEL
                       TIMEOUT
      STORE
      VALUE          LEARN
     OUT OF
     ORDER
      MSG
    HANDLING
                                     other
     DELIVER    A VALUE IS          Learner
    ALL VALID    MISSING

 ATOMIC BC
                 LEARN TIMEOUT
   WE STILL        LEARN TIMEOUT
    DON'T
    KNOW            LEARN REQ
                 LEARN TIMEOUT

                                               HAVE
                        LEARN
                                              VALUE
                             OR
                       LEARN FAIL         DON'T
                                          KNOW




                                                                      18
Acceptor State Machine




                         19
Heartbeat State Machine




                          20
Implementation (III)
๏HA Implementation uses state machines as
   infrastructure

๏notifications via listeners
๏piggyback heartbeat on messages
๏master election
  • (all - failed) have to agree

  • Paxos BC needs quorum of total     21
Multi-Paxos
๏everything is a State Machines
  •   use timeouts for reliability

  • handle failing messages

  • decouple network and time
      ‣for testability
  •   listeners interact on messages with
        outside world, sync or async        22
Unit-Testing

•   Mock Time
    ‣fast running tests despite timeouts
•   Mock Network
    ‣simulate delays, failing messages




                                           23
Unit-Test-Example




                    24
Setup   •Config

        • Video

        • Auto-Setup Script (Demo)




                                     25
Thank You - Questions?



                         26

More Related Content

PPTX
Implementing a Real-Time Streaming Recommendation Engine within Two Weeks wit...
Neo4j
 
PPTX
Querying Graphs with GraphQL
jexp
 
PPTX
A Game of Data and GraphQL
jexp
 
PDF
Graphs & Neo4j - Past Present Future
jexp
 
PDF
Practical Graph Algorithms with Neo4j
jexp
 
PDF
Looming Marvelous - Virtual Threads in Java Javaland.pdf
jexp
 
PDF
Easing the daily grind with the awesome JDK command line tools
jexp
 
PDF
Looming Marvelous - Virtual Threads in Java
jexp
 
Implementing a Real-Time Streaming Recommendation Engine within Two Weeks wit...
Neo4j
 
Querying Graphs with GraphQL
jexp
 
A Game of Data and GraphQL
jexp
 
Graphs & Neo4j - Past Present Future
jexp
 
Practical Graph Algorithms with Neo4j
jexp
 
Looming Marvelous - Virtual Threads in Java Javaland.pdf
jexp
 
Easing the daily grind with the awesome JDK command line tools
jexp
 
Looming Marvelous - Virtual Threads in Java
jexp
 

More from jexp (20)

PPTX
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
jexp
 
PPTX
Neo4j Connector Apache Spark FiNCENFiles
jexp
 
PPTX
How Graphs Help Investigative Journalists to Connect the Dots
jexp
 
PPTX
The Home Office. Does it really work?
jexp
 
PDF
Polyglot Applications with GraalVM
jexp
 
PPTX
Neo4j Graph Streaming Services with Apache Kafka
jexp
 
PDF
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
PPTX
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
jexp
 
PPTX
Refactoring, 2nd Edition
jexp
 
PPTX
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
jexp
 
PPTX
GraphQL - The new "Lingua Franca" for API-Development
jexp
 
PPTX
A whirlwind tour of graph databases
jexp
 
PDF
Intro to Graphs and Neo4j
jexp
 
PDF
Class graph neo4j and software metrics
jexp
 
KEY
Spring Data Neo4j Intro SpringOne 2012
jexp
 
KEY
Intro to Cypher
jexp
 
KEY
Geekout publish
jexp
 
KEY
Intro to Neo4j presentation
jexp
 
KEY
Neo4j & (J) Ruby Presentation JRubyConf.EU
jexp
 
KEY
Intro to Spring Data Neo4j
jexp
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
jexp
 
Neo4j Connector Apache Spark FiNCENFiles
jexp
 
How Graphs Help Investigative Journalists to Connect the Dots
jexp
 
The Home Office. Does it really work?
jexp
 
Polyglot Applications with GraalVM
jexp
 
Neo4j Graph Streaming Services with Apache Kafka
jexp
 
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
jexp
 
Refactoring, 2nd Edition
jexp
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
jexp
 
GraphQL - The new "Lingua Franca" for API-Development
jexp
 
A whirlwind tour of graph databases
jexp
 
Intro to Graphs and Neo4j
jexp
 
Class graph neo4j and software metrics
jexp
 
Spring Data Neo4j Intro SpringOne 2012
jexp
 
Intro to Cypher
jexp
 
Geekout publish
jexp
 
Intro to Neo4j presentation
jexp
 
Neo4j & (J) Ruby Presentation JRubyConf.EU
jexp
 
Intro to Spring Data Neo4j
jexp
 
Ad

Recently uploaded (20)

PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Ad

New Neo4j Auto HA Cluster

  • 1. Neo4j High Availability New Auto-Cluster Michael Hunger - @mesirii 1
  • 2. High Availability Cluster ๏Neo4j Enterprise ๏Master-Slave Replication ๏read-scaling and fault-tolerance ๏eventual consistency • write to master (push_factor) • write to slaves 2
  • 3. 3 Separate Concerns (I) ๏Cluster Management • Members join/leave/heartbeat ๏Failover • Master Election • Distribution of Master-Status 3
  • 4. 3 Separate Concerns (II) ๏Replication •synchronized id-generation • distributed locks • pull, push of transactions • initial store synchronization 4
  • 5. Pre 1.9 - Zookeeper 5
  • 6. Pre 1.9 ๏Apache Zookeeper took care of concerns • Cluster Management ‣new members register with ZK • Failover ‣ZK stores Master and last TX-Id ‣ZK uses ZAB to determine new Master and distribute information 6
  • 7. HA Cluster Coordinator RO- Coordinator Slave Master Slave Slave Coordinator 7
  • 8. Pre 1.9 - Problems ๏Additional setup and operations of a separate component ๏unreliable operation / hiccups ๏longterm stability ๏no dynamic recong of the ZK cluster important for cloud setup 8
  • 9. Post 1.9 - Neo4j Auto Cluster 9
  • 10. Replace Zookeeper!? ๏Implement Multi-Paxos ourselves ๏simple, testable code ๏only covers • cluster management, • master election 10
  • 12. What is Paxos? ๏reliable consensus making ๏broadcasting ๏works even with unreliable communication •message lost • delays, invalid order ๏does not guarantee progress 12
  • 14. Implementation ๏everything is a State Machines • SM = stateless enums + context • Message = type enum + payload • State = enum instance • switch on msg-type, implement logic Transition = handle() messages, 14
  • 15. Implementation (II) ๏everything is a State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 15
  • 16. Implementation (II) ๏Paxos (3 roles) Acceptor • Proposer-SM Paxos • Acceptor-SM Proposer Learner • Learner-SM ClusterState ๏Cluster • Heartbeat Heartbeat 16
  • 17. Multi-Paxos (happy path) Acceptor Learner Proposer (2 * f + 1) PREPARE PREPARE TIMEOUT VALUE PROMISE MATCH OR REJECT NO MATCH ACCEPT MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE ... VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL 17 LEARN TIMEOUT DON'T KNOW
  • 18. TIMEOUT Multi-Paxos (happy path) PROMISE ACCEPT ... MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL LEARN TIMEOUT DON'T KNOW LEARN REQ LEARN TIMEOUT HAVE LEARN VALUE OR LEARN FAIL DON'T KNOW 18
  • 21. Implementation (III) ๏HA Implementation uses state machines as infrastructure ๏notications via listeners ๏piggyback heartbeat on messages ๏master election • (all - failed) have to agree • Paxos BC needs quorum of total 21
  • 22. Multi-Paxos ๏everything is a State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 22
  • 23. Unit-Testing • Mock Time ‣fast running tests despite timeouts • Mock Network ‣simulate delays, failing messages 23
  • 25. Setup •Cong • Video • Auto-Setup Script (Demo) 25
  • 26. Thank You - Questions? 26