0% found this document useful (0 votes)

3 views

Lecture-04

The lecture discusses the importance of failure detection in distributed and cloud computing, emphasizing that failures are common in datacenters. It outlines various methods for building failure detectors, including centralized and gossip-style approaches, and highlights the properties that effective detectors should possess, such as completeness, accuracy, and speed. The presentation also covers the SWIM protocol as a more efficient alternative to traditional heartbeating methods for detecting failures in large systems.

Uploaded by

Raihan Kabir Rifat

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture-04

Uploaded by

Raihan Kabir Rifat

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

CSE-813(Distributed and Cloud Computing)

Dr. Atiqur Rahman

ড. আতিকুর রহমান
Ph.D.(CQUPT, China), MS.Engg.(CU), B.Sc.(CU)
Associate Professor
Department of Computer Science and Engineering
University of Chittagong

Lecture 4: Failure Detection

and Membership
A Challenge
• You’ve been put in charge of a datacenter, and your
manager has told you, “Oh no! We don’t have any failures
in our datacenter!”

• Do you believe him/her?

• What would be your first responsibility?

• Build a failure detector
• What are some things that could go wrong if you didn’t do
this?
Failures are the Norm
… not the exception, in datacenters.

Say, the rate of failure of one machine (OS/disk/motherboard/network,

etc.) is once every 10 years (120 months) on average.

When you have 120 servers in the DC, the mean time to failure (MTTF)
of the next machine is 1 month.

When you have 12,000 servers in the DC, the MTTF is about once every
7.2 hours!

Soft crashes and failures are even more frequent!

To build a failure detector
• You have a few options

1. Hire 1000 people, each to monitor one machine in

the datacenter and report to you when it fails.
2. Write a failure detector program (distributed) that
automatically detects failures and reports to your
workstation.
Target Settings
• Process ‘group’-based systems
– Clouds/Datacenters
– Replicated servers
– Distributed databases

• Crash-stop/Fail-stop process failures 5

Group Membership Service
Application Queries Application Process pi
e.g., gossip, overlays,
DHT’s, etc.
joins, leaves, failures
of members
Membership
Protocol
Membership
Group List
Membership List
Unreliable
Communication 6
Two sub-protocols
Application Process pi
Group
Membership List
pj
• Complete list all the time (Strongly consistent) Dissemination
• Virtual synchrony
• Almost-Complete list (Weakly consistent)
Failure Detector
• Gossip-style, SWIM, …
• Or Partial-random list (other systems)
• SCAMP, T-MAN, Cyclon,… Unreliable
Focus of this series of lecture Communication 7
Large Group: Scalability A
Goal
this is us (pi) Process Group
“Members”

1000’s of processes

Unreliable Communication
Network
8
Group Membership
Protocol II Failure Detector
Some process
pi finds out quickly
pj crashed
I pj

III Dissemination
Unreliable Communication
Network
Crash-stop Failures only 9
Next
• How do you design a group membership
protocol?

10
I. pj crashes
• Nothing we can do about it!
• A frequent occurrence
• Common case rather than exception
• Frequency goes up linearly with size of
datacenter

11
II. Distributed Failure
Detectors: Desirable Properties
• Completeness = each failure is detected
• Accuracy = there is no mistaken detection
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 12
Distributed Failure Detectors:
Properties
Impossible together in
• Completeness
lossy networks [Chandra
• Accuracy and Toueg]
• Speed
– Time to first detection of a failureIf possible, then can
solve consensus!
• Scale
– Equal Load on each member
– Network Message Load
13
What Real Failure Detectors
Prefer
• Completeness Guaranteed
Partial/Probabilistic
• Accuracy guarantee
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 14
What Real Failure Detectors
Prefer
• Completeness Guaranteed
Partial/Probabilistic
• Accuracy guarantee
• Speed
– Time to first detection of a failure
• Scale Time until some
process detects the failure
– Equal Load on each member
– Network Message Load 15
What Real Failure Detectors
Prefer
• Completeness Guaranteed
Partial/Probabilistic
• Accuracy guarantee
• Speed
– Time to first detection of a failure
• Scale Time until some
process detects the failure
– Equal Load on each member
No bottlenecks/single
– Network Message Load failure point 16
Failure Detector Properties
• Completeness In spite of
arbitrary simultaneous
• Accuracy process failures
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 17
Centralized Heartbeating
 Hotspot
pi

…
pi, Heartbeat Seq. l++
pj • Heartbeats sent periodically
• If heartbeat not received from pi within
18
timeout, mark pi as failed
Ring Heartbeating
 Unpredictable on
pi simultaneous multiple
pi, Heartbeat Seq. l++
failures
pj

…
…

19
All-to-All Heartbeating
pi  Equal load per member
pi, Heartbeat Seq. l++

…
pj

20
Next
• How do we increase the robustness of all-to-all
heartbeating?

21
Gossip-style Heartbeating
 Good accuracy
Array of pi properties
Heartbeat Seq. l
for member subset

22
Gossip-Style Failure
Detection 1 10118 64
2 10110 64
1 10120 66 3 10090 58
2 10103 62 4 10111 65
3 10098 63 2
4 10111 65 1
1 10120 70
Address Time (local) 2 10110 64
Heartbeat Counter 3 10098 70
Protocol:
• Nodes periodically gossip their membership 4 4 10111 65

list: pick random nodes, send it list 3

• On receipt, it is merged with local Current time : 70 at node 2
membership list (asynchronous clocks)
• When an entry times out, member is marked
as failed 23
Gossip-Style Failure
Detection
• If the heartbeat has not increased for more
than Tfail seconds,
the member is considered failed
• And after Tcleanup seconds, it will delete the
member from the list
• Why two different timeouts?
24
Gossip-Style Failure
Detection
• What if an entry pointing to a failed node is
deleted right after Tfail (=24) seconds?
1 10120 66
2 10110 64
1 10120 66 34 10098
10111 75
50
65
2 10103 62 4 10111 65
3 10098 55 2
4 10111 65 1
Current time : 75 at node 2

4
3 25
Multi-level Gossiping
• Network topology is
hierarchical N/2 nodes in a subnet
• Random gossip target selection (Slide corrected after lecture)
=> core routers face O(N) load
(Why?)
Router
• Fix: In subnet i, which contains
ni nodes, pick gossip target in
your subnet with probability
(1-1/ni)
• Router load=O(1)
• Dissemination time=O(log(N))
• What about latency for multi-
level topologies? N/2 nodes in a subnet
26
Analysis/Discussion
• What happens if gossip period Tgossip is decreased?
• A single heartbeat takes O(log(N)) time to propagate. So: N heartbeats
take:
– O(log(N)) time to propagate, if bandwidth allowed per node is allowed to be
O(N)
– O(N.log(N)) time to propagate, if bandwidth allowed per node is only O(1)
– What about O(k) bandwidth?
• What happens to Pmistake (false positive rate) as Tfail ,Tcleanup is increased?
• Tradeoff: False positive rate vs. detection time vs. bandwidth

27
Next
• So, is this the best we can do? What is the best
we can do?

28
Failure Detector Properties
…
• Completeness
• Accuracy
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 29
…Are application-defined
Requirements
• Completeness Guarantee always
Probability PM(T)
• Accuracy
T time units
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load
30
…Are application-defined
Requirements
• Completeness Guarantee always
Probability PM(T)
• Accuracy
T time units
• Speed
– Time to first detection of a failure
N*L: Compare this across protocols
• Scale
– Equal Load on each member
– Network Message Load
31
All-to-All Heartbeating
pi, Heartbeat Seq. l++ pi Every T units
L=N/T
…

32
Gossip-style Heartbeating
pi T=logN * tg
Array of
Heartbeat Seq. l L=N/tg=N*logN/T
for member subset
Every tg units
=gossip period,
send O(N) gossip
message

33
What’s the Best/Optimal we can
do? Slide changed after lecture

• Worst case load L* per member in the group

(messages per second)
– as a function of T, PM(T), N
– Independent Message Loss probability pml

log( PM (T )) 1
• L*  .
log( p ) T
ml

34
Heartbeating
• Optimal L is independent of N (!)
• All-to-all and gossip-based: sub-optimal
• L=O(N/T)
• try to achieve simultaneous detection at all processes
• fail to distinguish Failure Detection and Dissemination
components

Key:
Separate the two components
Use a non heartbeat-based Failure Detection Component
35
Next
• Is there a better failure detector?

36
SWIM Failure Detector
Protocol
pi pj
• random pj
ping K random
ack processes
• random K
ping-req
X
X
Protocol ping
period ack
= T’ time units ack

37
SWIM versus Heartbeating
Heartbeating
O(N)

First Detection
Time
SWIM Heartbeating
Constant

For Fixed : Constant Process Load O(N)

• False Positive Rate
• Message Loss Rate 38
SWIM Failure Detector
Parameter SWIM

First Detection Time

• Expected
 e  periods
 e  1
• Constant (independent of group size)

Process Load • Constant per period

• < 8 L* for 15% loss

False Positive Rate • Tunable (via K)

• Falls exponentially as load is scaled

Completeness • Deterministic time-bounded

• Within O(log(N)) periods w.h.p. 39
Accuracy, Load

• PM(T) is exponential in -K. Also depends on pml (and

pf )
– See paper

L E[ L]
•  28 8
L* L* for up to 15 % loss rates
40
Detection Time

1 N1 1
• Prob. of being pinged in T’= 1  (1  ) 1  e
N
• E[T ] = e
T'.
e 1
• Completeness: Any alive member detects failure
– Eventually
– By using a trick: within worst case O(N) protocol periods
41
Next
• How do failure detectors fit into the big picture
of a group membership protocol?
• What are the missing blocks?

42
Group Membership
Protocol II Failure Detector
Some process
pi finds out quickly
pj crashed
I pj

III Dissemination
Unreliable Communication
Network
Crash-stop Failures only 43
Dissemination Options
• Multicast (Hardware / IP)
– unreliable
– multiple simultaneous multicasts
• Point-to-point (TCP / UDP)
– expensive
• Zero extra messages: Piggyback on Failure
Detector messages
– Infection-style Dissemination 44
Infection-style Dissemination
pi pj
• random pj
ping K random
ack processes
• random K
ping-req
X
X
Protocol ping
period ack
= T time units ack Piggybacked
membership
information
45
Suspicion Mechanism
• False detections, due to
– Perturbed processes
– Packet losses, e.g., from congestion
• Indirect pinging may not solve the problem
• Key: suspect a process before declaring it as
failed in the group

46
Suspicion Mechanism pi
pi:: State Machine for pj view element
Dissmn (Suspect pj) Dissmn
d ) FD47
i l e
a t pj Suspected
f
g ec
i n
p usp Tim
i
p :(S s eo
: : : c e s ut
FD smn s uc j )
Di s g ve p
i n
p Ali
pi
Alive D:: n::( Failed
F s sm
Di
Dissmn (Alive pj) Dissmn (Failed pj)
Suspicion Mechanism
• Distinguish multiple suspicions of a process
– Per-process incarnation number
– Inc # for pi can be incremented only by pi
• e.g., when it receives a (Suspect, pi) message
– Somewhat similar to DSDV
• Higher inc# notifications over-ride lower inc#’s
• Within an inc#: (Suspect inc #) > (Alive, inc #)
• (Failed, inc #) overrides everything else
48
Wrap Up
• Failures the norm, not the exception in datacenters
• Every distributed system uses a failure detector
• Many distributed systems use a membership service

• Ring failure detection underlies

– IBM SP2 and many other similar clusters/machines

• Gossip-style failure detection underlies

– Amazon EC2/S3 (rumored!)
49

Stock Trading The Comprehensive Guide
100% (2)
Stock Trading The Comprehensive Guide
63 pages
NetOps 2.0 Transformation: The DIRE Methodology
From Everand
NetOps 2.0 Transformation: The DIRE Methodology
Ray Belleville
5/5 (1)
Kenya Water Design - Manual - 2005 PDF
No ratings yet
Kenya Water Design - Manual - 2005 PDF
500 pages
Rigid Body Dynamics
No ratings yet
Rigid Body Dynamics
123 pages
Lecture 04
No ratings yet
Lecture 04
49 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Indranil Gupta (Indy) Sep 8, 2016
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Indranil Gupta (Indy) Sep 8, 2016
66 pages
T5 Failure Detectors
No ratings yet
T5 Failure Detectors
67 pages
Lecture 4 - Failure Detection and Membership
No ratings yet
Lecture 4 - Failure Detection and Membership
36 pages
Lecture 4 - Failure Detection and Membership
No ratings yet
Lecture 4 - Failure Detection and Membership
18 pages
FailureDetector ds14
No ratings yet
FailureDetector ds14
33 pages
Computer Science 425 Distributed Systems: CS 425 / ECE 428
No ratings yet
Computer Science 425 Distributed Systems: CS 425 / ECE 428
34 pages
Gossip
No ratings yet
Gossip
16 pages
5.1 - What is Group Membership List
No ratings yet
5.1 - What is Group Membership List
57 pages
A Gossip-Style Failure Detection Service
No ratings yet
A Gossip-Style Failure Detection Service
16 pages
Chap 15
No ratings yet
Chap 15
72 pages
Unit 4
No ratings yet
Unit 4
11 pages
notes (2)
No ratings yet
notes (2)
584 pages
Consensus Failure
No ratings yet
Consensus Failure
79 pages
Notes On Theory of Distributed Systems
No ratings yet
Notes On Theory of Distributed Systems
556 pages
Failure Detectors For Large-Scale Distributed Systems: Naohiro Hayashibara Adel Cherif
No ratings yet
Failure Detectors For Large-Scale Distributed Systems: Naohiro Hayashibara Adel Cherif
6 pages
Notes On Theory of Distributed System
No ratings yet
Notes On Theory of Distributed System
517 pages
Distributed Systems Fall 2023: Lecture 5: Gossiping
No ratings yet
Distributed Systems Fall 2023: Lecture 5: Gossiping
29 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Notes on Theory on Distributed Systems
No ratings yet
Notes on Theory on Distributed Systems
513 pages
DS Chapter V8.0fault Tolerance
No ratings yet
DS Chapter V8.0fault Tolerance
23 pages
1-Lecture (2. Intro-Core Challenges)_Slides
No ratings yet
1-Lecture (2. Intro-Core Challenges)_Slides
22 pages
8fault Tolerance in Collaborative Sensor Networks For Target
No ratings yet
8fault Tolerance in Collaborative Sensor Networks For Target
37 pages
Li Zening
No ratings yet
Li Zening
61 pages
Fault Tolerance Fdcc
No ratings yet
Fault Tolerance Fdcc
76 pages
ch08 Ts TK Fault Tolerance I
No ratings yet
ch08 Ts TK Fault Tolerance I
29 pages
DistributedSystems Notes
No ratings yet
DistributedSystems Notes
73 pages
Gossip-Based Peer Sampling Original Paper
No ratings yet
Gossip-Based Peer Sampling Original Paper
36 pages
Notes
No ratings yet
Notes
399 pages
group communication
No ratings yet
group communication
4 pages
Notes On Distributed Systems
No ratings yet
Notes On Distributed Systems
384 pages
Modeling and Performance Evaluation of Wirelesshart
No ratings yet
Modeling and Performance Evaluation of Wirelesshart
89 pages
Fault System One
No ratings yet
Fault System One
19 pages
CSE446 Lecture 4
No ratings yet
CSE446 Lecture 4
32 pages
Fault
No ratings yet
Fault
101 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
Distributed Systems
67% (3)
Distributed Systems
331 pages
Consensus & Agreement: Arvind Krishnamurthy Fall 2003
No ratings yet
Consensus & Agreement: Arvind Krishnamurthy Fall 2003
41 pages
Lec 3
No ratings yet
Lec 3
30 pages
Time Series Utcn
No ratings yet
Time Series Utcn
39 pages
Optimal Gossiping in Directed Geometric Radio Networks in Presence of Dynamical Faults 1st Edition by Andrea Clementi, Angelo Monti, Francesco Pasquale, Riccardo Silvestri ISBN 9783540744566 - The ebook in PDF/DOCX format is available for instant download
No ratings yet
Optimal Gossiping in Directed Geometric Radio Networks in Presence of Dynamical Faults 1st Edition by Andrea Clementi, Angelo Monti, Francesco Pasquale, Riccardo Silvestri ISBN 9783540744566 - The ebook in PDF/DOCX format is available for instant download
41 pages
Notes 7
No ratings yet
Notes 7
3 pages
Chapter 8
No ratings yet
Chapter 8
107 pages
Distributed Sys 8
No ratings yet
Distributed Sys 8
97 pages
Analysis of Network Traffic Features For Anomaly Detection: Félix Iglesias Tanja Zseby
No ratings yet
Analysis of Network Traffic Features For Anomaly Detection: Félix Iglesias Tanja Zseby
26 pages
Zhong-2016-Study On Network Failure Prediction
No ratings yet
Zhong-2016-Study On Network Failure Prediction
7 pages
Basic Elements of Queueing Theory Lec Notes Philippe NAIN
No ratings yet
Basic Elements of Queueing Theory Lec Notes Philippe NAIN
110 pages
Basic Elements of Queueing Theory Application To The Modelling of Computer Systems Lecture Notes
No ratings yet
Basic Elements of Queueing Theory Application To The Modelling of Computer Systems Lecture Notes
110 pages
6.02 Notes
No ratings yet
6.02 Notes
223 pages
Distributed Systems: Fault Tolerance: Fall 2013
No ratings yet
Distributed Systems: Fault Tolerance: Fall 2013
42 pages
Optimal Gossiping in Directed Geometric Radio Networks in Presence of Dynamical Faults 1st Edition by Andrea Clementi, Angelo Monti, Francesco Pasquale, Riccardo Silvestri ISBN 9783540744566 - Download the ebook today and own the complete content
100% (5)
Optimal Gossiping in Directed Geometric Radio Networks in Presence of Dynamical Faults 1st Edition by Andrea Clementi, Angelo Monti, Francesco Pasquale, Riccardo Silvestri ISBN 9783540744566 - Download the ebook today and own the complete content
52 pages
9. Uncertainty propagation in complex networks (Chaos, 2020)
No ratings yet
9. Uncertainty propagation in complex networks (Chaos, 2020)
9 pages
Unit 3-1
No ratings yet
Unit 3-1
26 pages
Ch8 Distributed
No ratings yet
Ch8 Distributed
12 pages
06-da24-Consensus
No ratings yet
06-da24-Consensus
46 pages
Hack Attacks Denied: A Complete Guide to Network Lockdown
From Everand
Hack Attacks Denied: A Complete Guide to Network Lockdown
John Chirillo
3.5/5 (3)
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
J2EE AntiPatterns
From Everand
J2EE AntiPatterns
Bill Dudney
4/5 (2)
IT Service Management from Hell based on Not ITIL
From Everand
IT Service Management from Hell based on Not ITIL
itSMF International
No ratings yet
Graphics
No ratings yet
Graphics
24 pages
Lecture 02
No ratings yet
Lecture 02
32 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
L12 System Testing
No ratings yet
L12 System Testing
32 pages
Multimedia-Topics
No ratings yet
Multimedia-Topics
2 pages
Lecture 06(Reading)
No ratings yet
Lecture 06(Reading)
28 pages
Stat Note
No ratings yet
Stat Note
8 pages
MIT6 042JS10 Lec39 Sol
No ratings yet
MIT6 042JS10 Lec39 Sol
4 pages
HW 2 F 04 Solns
No ratings yet
HW 2 F 04 Solns
7 pages
Class Assignment
No ratings yet
Class Assignment
4 pages
Rock-Reading Shamanic Exercise Set-Up Shamanic Rock Reading Earlier in The Day: (NOTE: at Times of Year
0% (1)
Rock-Reading Shamanic Exercise Set-Up Shamanic Rock Reading Earlier in The Day: (NOTE: at Times of Year
2 pages
"Confessions of A Shinagawa Monkey," by Haruki Murakami - The New Yorker
No ratings yet
"Confessions of A Shinagawa Monkey," by Haruki Murakami - The New Yorker
13 pages
English For It Students - Feb 2023
No ratings yet
English For It Students - Feb 2023
6 pages
Hands-On Training With OpenFOAM - Flow Around A 2-D Airfoil
0% (1)
Hands-On Training With OpenFOAM - Flow Around A 2-D Airfoil
16 pages
Q3 LE Mathematics 4 Lesson 4 Week 4
No ratings yet
Q3 LE Mathematics 4 Lesson 4 Week 4
19 pages
Some Essay About MBA Program
No ratings yet
Some Essay About MBA Program
4 pages
SRD Status Check R350 - 370 - SASSA Status Check
No ratings yet
SRD Status Check R350 - 370 - SASSA Status Check
16 pages
Formula Sheet Business Math 2
No ratings yet
Formula Sheet Business Math 2
2 pages
SDQCQAManual
No ratings yet
SDQCQAManual
344 pages
Evolium Operation and Maintenance Center: The Right Answer To GSM Operators Challenges
100% (1)
Evolium Operation and Maintenance Center: The Right Answer To GSM Operators Challenges
4 pages
Accenture Practice Sheet -17 (1)
No ratings yet
Accenture Practice Sheet -17 (1)
5 pages
Bio
No ratings yet
Bio
19 pages
7 The Ocean Spirit Mami Wata Takes Many Faces
No ratings yet
7 The Ocean Spirit Mami Wata Takes Many Faces
3 pages
SDP MSP Admin Guide
No ratings yet
SDP MSP Admin Guide
492 pages
Chapter I
No ratings yet
Chapter I
12 pages
Field Study On Undrained Shear Strength of Soft Soil Around Micropiles - Revised 04052017
No ratings yet
Field Study On Undrained Shear Strength of Soft Soil Around Micropiles - Revised 04052017
6 pages
Ch.10 Producer's Equilibrium
No ratings yet
Ch.10 Producer's Equilibrium
3 pages
Pathway Level 3
No ratings yet
Pathway Level 3
12 pages
Assassination of Julius Caesar - Wikipedia
No ratings yet
Assassination of Julius Caesar - Wikipedia
1 page
Sisecamflatglass Acoustic Laminated
No ratings yet
Sisecamflatglass Acoustic Laminated
2 pages
Nikon D5600 Specifications PDF
No ratings yet
Nikon D5600 Specifications PDF
3 pages
MWD Instrument Lithium Battery User Guide
No ratings yet
MWD Instrument Lithium Battery User Guide
13 pages
E-WASTE Categories Schedule 1
No ratings yet
E-WASTE Categories Schedule 1
2 pages
Benefusion Sp3 Syringe Pump: Friendliness
No ratings yet
Benefusion Sp3 Syringe Pump: Friendliness
2 pages
Auto CAD Assignment
No ratings yet
Auto CAD Assignment
5 pages
Refrigerant Compressor
No ratings yet
Refrigerant Compressor
6 pages
Pirates of Dark Water Series Bible
100% (1)
Pirates of Dark Water Series Bible
13 pages

Lecture-04

Uploaded by

Lecture-04

Uploaded by

CSE-813(Distributed and Cloud Computing)

Dr. Atiqur Rahman

Lecture 4: Failure Detection

• Do you believe him/her?

• What would be your first responsibility?

Say, the rate of failure of one machine (OS/disk/motherboard/network,

Soft crashes and failures are even more frequent!

1. Hire 1000 people, each to monitor one machine in

• Crash-stop/Fail-stop process failures 5

list: pick random nodes, send it list 3

• Worst case load L* per member in the group

For Fixed : Constant Process Load O(N)

First Detection Time

Process Load • Constant per period

False Positive Rate • Tunable (via K)

Completeness • Deterministic time-bounded

• PM(T) is exponential in -K. Also depends on pml (and

• Ring failure detection underlies

• Gossip-style failure detection underlies

You might also like