0% found this document useful (0 votes)

9 views

Lecture 04

Uploaded by

atik

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Lecture 04

Uploaded by

atik

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

CSE-813(Distributed & Cloud Computing)

Even-’22

Dr. Atiqur Rahman

ড. আতিকু র রহমান
Ph.D.(CQUPT, China), MS.Engg.(CU), B.Sc.(CU)
Associate Professor
Department of Computer Science and Engineering
University of Chittagong

Lecture 4: Failure Detection and

Membership
A Challenge
• You’ve been put in charge of a datacenter, and your
manager has told you, “Oh no! We don’t have any failures
in our datacenter!”

• Do you believe him/her?

• What would be your first responsibility?

• Build a failure detector
• What are some things that could go wrong if you didn’t do
this?
Failures are the Norm
… not the exception, in datacenters.

Say, the rate of failure of one machine (OS/disk/motherboard/network,

etc.) is once every 10 years (120 months) on average.

When you have 120 servers in the DC, the mean time to failure (MTTF)
of the next machine is 1 month.

When you have 12,000 servers in the DC, the MTTF is about once every
7.2 hours!

Soft crashes and failures are even more frequent!

To build a failure detector
• You have a few options

1. Hire 1000 people, each to monitor one machine in

the datacenter and report to you when it fails.
2. Write a failure detector program (distributed) that
automatically detects failures and reports to your
workstation.
Target Settings
• Process ‘group’-based systems
– Clouds/Datacenters
– Replicated servers
– Distributed databases

• Crash-stop/Fail-stop process failures 5

Group Membership Service
Application Queries Application Process pi
e.g., gossip, overlays,
DHT’s, etc.
joins, leaves, failures
of members
Membership
Protocol
Membership
Group List
Membership List
Unreliable
Communication 6
Two sub-protocols
Application Process pi
Group
Membership List
pj
• Complete list all the time (Strongly consistent) Dissemination
• Virtual synchrony
• Almost-Complete list (Weakly consistent)
Failure Detector
• Gossip-style, SWIM, …
• Or Partial-random list (other systems)
• SCAMP, T-MAN, Cyclon,… Unreliable
Focus of this series of lecture Communication 7
Large Group: Scalability A Goal
this is us (pi) Process Group
“Members”

1000’s of processes

Unreliable Communication
Network
8
Group Membership Protocol
II Failure Detector
Some process
pi finds out quickly
pj crashed
I pj

III Dissemination
Unreliable Communication
Network
Crash-stop Failures only 9
Next
• How do you design a group membership
protocol?

10
I. pj crashes
• Nothing we can do about it!
• A frequent occurrence
• Common case rather than exception
• Frequency goes up linearly with size of
datacenter

11
II. Distributed Failure Detectors:
Desirable Properties
• Completeness = each failure is detected
• Accuracy = there is no mistaken detection
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 12
Distributed Failure Detectors:
Properties
Impossible together in
• Completeness
lossy networks [Chandra
• Accuracy and Toueg]
• Speed
– Time to first detection of a failureIf possible, then can
solve consensus!
• Scale
– Equal Load on each member
– Network Message Load
13
What Real Failure Detectors Prefer
• Completeness Guaranteed
Partial/Probabilistic
• Accuracy guarantee
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 14
What Real Failure Detectors Prefer
• Completeness Guaranteed
Partial/Probabilistic
• Accuracy guarantee
• Speed
– Time to first detection of a failure
• Scale Time until some
process detects the failure
– Equal Load on each member
– Network Message Load 15
What Real Failure Detectors Prefer
• Completeness Guaranteed
Partial/Probabilistic
• Accuracy guarantee
• Speed
– Time to first detection of a failure
• Scale Time until some
process detects the failure
– Equal Load on each member
No bottlenecks/single
– Network Message Load failure point 16
Failure Detector Properties
• Completeness In spite of
arbitrary simultaneous
• Accuracy process failures
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 17
Centralized Heartbeating
 Hotspot
pi

…
pi, Heartbeat Seq. l++
pj • Heartbeats sent periodically
• If heartbeat not received from pi within
18
timeout, mark pi as failed
Ring Heartbeating
 Unpredictable on
pi simultaneous multiple
pi, Heartbeat Seq. l++
failures
pj

…
…

19
All-to-All Heartbeating
pi  Equal load per member
pi, Heartbeat Seq. l++

…
pj

20
Next
• How do we increase the robustness of all-to-all
heartbeating?

21
Gossip-style Heartbeating
 Good accuracy
Array of pi properties
Heartbeat Seq. l
for member subset

22
Gossip-Style Failure Detection
1 10118 64
2 10110 64
1 10120 66 3 10090 58
2 10103 62 4 10111 65
3 10098 63 2
4 10111 65 1
1 10120 70
Address Time (local) 2 10110 64
Heartbeat Counter 3 10098 70
Protocol:
• Nodes periodically gossip their membership 4 4 10111 65

list: pick random nodes, send it list 3

• On receipt, it is merged with local Current time : 70 at node 2
membership list (asynchronous clocks)
• When an entry times out, member is marked
as failed 23
Gossip-Style Failure Detection
• If the heartbeat has not increased for more
than Tfail seconds,
the member is considered failed
• And after Tcleanup seconds, it will delete the
member from the list
• Why two different timeouts?
24
Gossip-Style Failure Detection
• What if an entry pointing to a failed node is
deleted right after Tfail (=24) seconds?
1 10120 66
2 10110 64
1 10120 66 34 10098
10111 75
50
65
2 10103 62 4 10111 65
3 10098 55 2
4 10111 65 1
Current time : 75 at node 2

4
3 25
Multi-level Gossiping
• Network topology is
hierarchical N/2 nodes in a subnet
• Random gossip target selection (Slide corrected after lecture)
=> core routers face O(N) load
(Why?)
Router
• Fix: In subnet i, which contains
ni nodes, pick gossip target in
your subnet with probability
(1-1/ni)
• Router load=O(1)
• Dissemination time=O(log(N))
• What about latency for multi-
level topologies? N/2 nodes in a subnet
26
Analysis/Discussion
• What happens if gossip period Tgossip is decreased?
• A single heartbeat takes O(log(N)) time to propagate. So: N heartbeats
take:
– O(log(N)) time to propagate, if bandwidth allowed per node is allowed to be
O(N)
– O(N.log(N)) time to propagate, if bandwidth allowed per node is only O(1)
– What about O(k) bandwidth?
• What happens to Pmistake (false positive rate) as Tfail ,Tcleanup is increased?
• Tradeoff: False positive rate vs. detection time vs. bandwidth

27
Next
• So, is this the best we can do? What is the best
we can do?

28
Failure Detector Properties …
• Completeness
• Accuracy
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load 29
…Are application-defined Requirements
• Completeness Guarantee always
Probability PM(T)
• Accuracy
T time units
• Speed
– Time to first detection of a failure
• Scale
– Equal Load on each member
– Network Message Load
30
…Are application-defined Requirements
• Completeness Guarantee always
Probability PM(T)
• Accuracy
T time units
• Speed
– Time to first detection of a failure
N*L: Compare this across protocols
• Scale
– Equal Load on each member
– Network Message Load
31
All-to-All Heartbeating
pi, Heartbeat Seq. l++ pi Every T units
L=N/T
…

32
Gossip-style Heartbeating
pi T=logN * tg
Array of
Heartbeat Seq. l L=N/tg=N*logN/T
for member subset
Every tg units
=gossip period,
send O(N) gossip
message

33
What’s the Best/Optimal we can do?
Slide changed after lecture

• Worst case load L* per member in the group

(messages per second)
– as a function of T, PM(T), N
– Independent Message Loss probability pml

log( PM (T )) 1
• L*  .
log( p ) T
ml

34
Heartbeating
• Optimal L is independent of N (!)
• All-to-all and gossip-based: sub-optimal
• L=O(N/T)
• try to achieve simultaneous detection at all processes
• fail to distinguish Failure Detection and Dissemination
components

Key:
Separate the two components
Use a non heartbeat-based Failure Detection Component
35
Next
• Is there a better failure detector?

36
SWIM Failure Detector Protocol
pi pj
• random pj
ping K random
ack processes
• random K
ping-req
X
X
Protocol ping
period ack
= T’ time units ack

37
SWIM versus Heartbeating
Heartbeating
O(N)

First Detection
Time
SWIM Heartbeating
Constant

For Fixed : Constant Process Load O(N)

• False Positive Rate
• Message Loss Rate 38
SWIM Failure Detector
Parameter SWIM

First Detection Time

• Expected
 e periods
 e 1
• Constant (independent of group size)

Process Load • Constant per period

• < 8 L* for 15% loss

False Positive Rate • Tunable (via K)

• Falls exponentially as load is scaled

Completeness • Deterministic time-bounded

• Within O(log(N)) periods w.h.p. 39
Accuracy, Load

• PM(T) is exponential in -K. Also depends on pml (and

pf )
– See paper

L E[ L]
•  28 8
L* L* for up to 15 % loss rates
40
Detection Time

1 N 1 1
• Prob. of being pinged in T’= 1  (1  )  1  e
N
• E[T ] = e
T'.
e 1
• Completeness: Any alive member detects failure
– Eventually
– By using a trick: within worst case O(N) protocol periods
41
Next
• How do failure detectors fit into the big picture
of a group membership protocol?
• What are the missing blocks?

42
Group Membership Protocol
II Failure Detector
Some process
pi finds out quickly
pj crashed
I pj

III Dissemination
Unreliable Communication
Network
Crash-stop Failures only 43
Dissemination Options
• Multicast (Hardware / IP)
– unreliable
– multiple simultaneous multicasts
• Point-to-point (TCP / UDP)
– expensive
• Zero extra messages: Piggyback on Failure
Detector messages
– Infection-style Dissemination 44
Infection-style Dissemination
pi pj
• random pj
ping K random
ack processes
• random K
ping-req
X
X
Protocol ping
period ack
= T time units ack Piggybacked
membership
information
45
Suspicion Mechanism
• False detections, due to
– Perturbed processes
– Packet losses, e.g., from congestion
• Indirect pinging may not solve the problem
• Key: suspect a process before declaring it as
failed in the group

46
Suspicion Mechanism pi
pi:: State Machine for pj view element
Dissmn (Suspect pj) Dissmn
d ) FD47
i l e
a t pj Suspected
f
g ec
i n
p usp Tim
i
p :(S s eo
: : : c e s ut
FD smn s uc j )
Di s g ve p
i n
p Ali
pi
Alive D:: n::( Failed
F s sm
Di
Dissmn (Alive pj) Dissmn (Failed pj)
Suspicion Mechanism
• Distinguish multiple suspicions of a process
– Per-process incarnation number
– Inc # for pi can be incremented only by pi
• e.g., when it receives a (Suspect, pi) message
– Somewhat similar to DSDV
• Higher inc# notifications over-ride lower inc#’s
• Within an inc#: (Suspect inc #) > (Alive, inc #)
• (Failed, inc #) overrides everything else
48
Wrap Up
• Failures the norm, not the exception in datacenters
• Every distributed system uses a failure detector
• Many distributed systems use a membership service

• Ring failure detection underlies

– IBM SP2 and many other similar clusters/machines

• Gossip-style failure detection underlies

– Amazon EC2/S3 (rumored!)
49

Lecture-04
No ratings yet
Lecture-04
49 pages
T5 Failure Detectors
No ratings yet
T5 Failure Detectors
67 pages
Lecture 4 - Failure Detection and Membership
No ratings yet
Lecture 4 - Failure Detection and Membership
36 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Indranil Gupta (Indy) Sep 8, 2016
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Indranil Gupta (Indy) Sep 8, 2016
66 pages
Lecture 4 - Failure Detection and Membership
No ratings yet
Lecture 4 - Failure Detection and Membership
18 pages
Computer Science 425 Distributed Systems: CS 425 / ECE 428
No ratings yet
Computer Science 425 Distributed Systems: CS 425 / ECE 428
34 pages
FailureDetector ds14
No ratings yet
FailureDetector ds14
33 pages
Fault Tolerance Techniques: Unit 3
No ratings yet
Fault Tolerance Techniques: Unit 3
40 pages
BITS ZG553 Real Time Systems L-1 KGK 1564743779321 PDF
No ratings yet
BITS ZG553 Real Time Systems L-1 KGK 1564743779321 PDF
41 pages
Fault Tolerance Slides
No ratings yet
Fault Tolerance Slides
18 pages
One of the Main Characteristics of Distributed Systems is Concurrency
No ratings yet
One of the Main Characteristics of Distributed Systems is Concurrency
4 pages
2020 CoE2DX4 W7
No ratings yet
2020 CoE2DX4 W7
43 pages
PDF pt203 Sos Nutanix Troubleshooting
No ratings yet
PDF pt203 Sos Nutanix Troubleshooting
26 pages
Penetration Testing Learning Kit
No ratings yet
Penetration Testing Learning Kit
51 pages
Verilator Fast Free Me DVClub10 Pres
No ratings yet
Verilator Fast Free Me DVClub10 Pres
40 pages
The Galera Cluster
No ratings yet
The Galera Cluster
18 pages
Evading Detection - A Beginner's Guide To Obfuscation
No ratings yet
Evading Detection - A Beginner's Guide To Obfuscation
80 pages
Fault Injection by Evans Jones Netflix
No ratings yet
Fault Injection by Evans Jones Netflix
44 pages
Snort IDS Defense Evasion
No ratings yet
Snort IDS Defense Evasion
8 pages
Hacking Machines
No ratings yet
Hacking Machines
38 pages
Fyp
No ratings yet
Fyp
38 pages
The Realities of Software Testing: (Reading Assignment: Chapter 3, Pp. 37-50)
No ratings yet
The Realities of Software Testing: (Reading Assignment: Chapter 3, Pp. 37-50)
29 pages
Comp3632 Firewall Ids
No ratings yet
Comp3632 Firewall Ids
47 pages
Interrupts, Exceptions, and System Calls: Chester Rebeiro IIT Madras
No ratings yet
Interrupts, Exceptions, and System Calls: Chester Rebeiro IIT Madras
67 pages
S62797 - LLM Inference Sizing_ Benchmarking End-to-End Inference Systems
No ratings yet
S62797 - LLM Inference Sizing_ Benchmarking End-to-End Inference Systems
36 pages
Fault Tolerant Message Passing Systems
No ratings yet
Fault Tolerant Message Passing Systems
26 pages
Lecture-2.1 DLL Design Issues
No ratings yet
Lecture-2.1 DLL Design Issues
28 pages
CA-chap6-IO System
No ratings yet
CA-chap6-IO System
59 pages
Untangle
100% (4)
Untangle
18 pages
Consensus Failure
No ratings yet
Consensus Failure
79 pages
5.1. OSI Model Part I
No ratings yet
5.1. OSI Model Part I
52 pages
Intrusion Detection/Prevention Systems
100% (1)
Intrusion Detection/Prevention Systems
29 pages
PTC_Interview-Questions-on-Vulnerability-Assessment-1
No ratings yet
PTC_Interview-Questions-on-Vulnerability-Assessment-1
20 pages
Level-1 Troubleshooting For TAC
No ratings yet
Level-1 Troubleshooting For TAC
24 pages
Application Performance Analysis - Sharkfest - Wireshark
No ratings yet
Application Performance Analysis - Sharkfest - Wireshark
58 pages
BRKRST 3363
No ratings yet
BRKRST 3363
120 pages
Chapte Four DS
No ratings yet
Chapte Four DS
37 pages
1-Lecture (2. Intro-Core Challenges)_Slides
No ratings yet
1-Lecture (2. Intro-Core Challenges)_Slides
22 pages
Introduction To Software Engineering
No ratings yet
Introduction To Software Engineering
59 pages
Ch. 2
No ratings yet
Ch. 2
48 pages
QA - Lab 1 - Intro To Testing
No ratings yet
QA - Lab 1 - Intro To Testing
60 pages
WEEK 1 Merged
No ratings yet
WEEK 1 Merged
554 pages
Week 1
No ratings yet
Week 1
79 pages
11
No ratings yet
11
23 pages
3vulnerability Management DICT
No ratings yet
3vulnerability Management DICT
41 pages
Incident Handling Management-Presentation
No ratings yet
Incident Handling Management-Presentation
24 pages
Issues in Benchmarking Intrusion Detection Systems: Marcus J. Ranum
No ratings yet
Issues in Benchmarking Intrusion Detection Systems: Marcus J. Ranum
26 pages
6-Hunting Malware Part 1
No ratings yet
6-Hunting Malware Part 1
22 pages
Failure Detector: Degrees of Completeness
No ratings yet
Failure Detector: Degrees of Completeness
4 pages
Unit - 6 Advanced Concepts of Authorization
No ratings yet
Unit - 6 Advanced Concepts of Authorization
39 pages
Real Time Systems IX
No ratings yet
Real Time Systems IX
40 pages
Lecture 1
No ratings yet
Lecture 1
45 pages
Fault System One
No ratings yet
Fault System One
19 pages
Intro_TS
No ratings yet
Intro_TS
44 pages
Distributed Systems - Fault Tolerance
No ratings yet
Distributed Systems - Fault Tolerance
21 pages
Netscout University - Lab - Zombie Detection Countermeasure
No ratings yet
Netscout University - Lab - Zombie Detection Countermeasure
5 pages
2020 CoE2DX4 W7
No ratings yet
2020 CoE2DX4 W7
45 pages
07 - Scanning
No ratings yet
07 - Scanning
16 pages
Detecting Spam Zombies by Monitoring Outgoing Messages
No ratings yet
Detecting Spam Zombies by Monitoring Outgoing Messages
25 pages
Hack Attacks Denied: A Complete Guide to Network Lockdown
From Everand
Hack Attacks Denied: A Complete Guide to Network Lockdown
John Chirillo
3.5/5 (3)
watershed (1)
No ratings yet
watershed (1)
9 pages
SWJ 3625
No ratings yet
SWJ 3625
26 pages
Data Management Tools at Meta
No ratings yet
Data Management Tools at Meta
13 pages
Data Infrastructure at Meta: Atik Ishrak October 2024
No ratings yet
Data Infrastructure at Meta: Atik Ishrak October 2024
6 pages
Lecture 01
No ratings yet
Lecture 01
39 pages
Minimax With - Alpha Beta Pruning
No ratings yet
Minimax With - Alpha Beta Pruning
9 pages
Unigraphics NX Interview Questions and Answers 3 Engineering Wave PDF
No ratings yet
Unigraphics NX Interview Questions and Answers 3 Engineering Wave PDF
3 pages
Service Orientation - Introduction and Fundamentals
No ratings yet
Service Orientation - Introduction and Fundamentals
40 pages
BIM and GIS Data Integration Guidelines (June 2023 Edition)
No ratings yet
BIM and GIS Data Integration Guidelines (June 2023 Edition)
14 pages
Week#8(Containerization)
No ratings yet
Week#8(Containerization)
17 pages
Devops Syllabus - by Murali P N, Besant Technologies PDF
No ratings yet
Devops Syllabus - by Murali P N, Besant Technologies PDF
10 pages
VTAG - Service Management Proposal - v2.3.3
No ratings yet
VTAG - Service Management Proposal - v2.3.3
16 pages
Pdfmergerfreecom Yukiko Ikebe Strategipdf Free Ebook and User Guide Downloadcompress
No ratings yet
Pdfmergerfreecom Yukiko Ikebe Strategipdf Free Ebook and User Guide Downloadcompress
1 page
Download Full Applied Statistics and the SAS Programming Language 5th Edition Ron P. Cody PDF All Chapters
100% (4)
Download Full Applied Statistics and the SAS Programming Language 5th Edition Ron P. Cody PDF All Chapters
61 pages
Starter Guide 2
No ratings yet
Starter Guide 2
7 pages
Symbol Names
No ratings yet
Symbol Names
6 pages
81301 M.Sc-III Sem. INFO. TECHNOLOGY- DOT NET TECHNOLOGY-3
No ratings yet
81301 M.Sc-III Sem. INFO. TECHNOLOGY- DOT NET TECHNOLOGY-3
2 pages
STOCK ALL BRAND 2 Juni 2022
No ratings yet
STOCK ALL BRAND 2 Juni 2022
9 pages
StewartCalcET8 14 04
No ratings yet
StewartCalcET8 14 04
18 pages
Antigen Map 3 D Manual
No ratings yet
Antigen Map 3 D Manual
10 pages
Distributed Systems KCS077
No ratings yet
Distributed Systems KCS077
2 pages
Electrical Earthing (Grounding) Systems - A Technical Report and A Short Course
No ratings yet
Electrical Earthing (Grounding) Systems - A Technical Report and A Short Course
4 pages
Conditional Statement
No ratings yet
Conditional Statement
7 pages
Catalog CIJ T-560 C-2
No ratings yet
Catalog CIJ T-560 C-2
2 pages
An LLL Algorithm With Quadratic Complexity: Abstract. The Lenstra-Lenstra-Lov
No ratings yet
An LLL Algorithm With Quadratic Complexity: Abstract. The Lenstra-Lenstra-Lov
30 pages
Niraja Adithya - Dasireddi - 8074410698
No ratings yet
Niraja Adithya - Dasireddi - 8074410698
1 page
Binary Tree (Python) - All Codes
No ratings yet
Binary Tree (Python) - All Codes
4 pages
01 Chapter One - The Overview of Computer Basics
No ratings yet
01 Chapter One - The Overview of Computer Basics
74 pages
Ryazantcev Vue
No ratings yet
Ryazantcev Vue
3 pages
Fundamentals of Loadrunner 9.0: Overview
No ratings yet
Fundamentals of Loadrunner 9.0: Overview
3 pages
Assignment Solution
No ratings yet
Assignment Solution
32 pages
CSS Reviewer
No ratings yet
CSS Reviewer
7 pages
Service Mesh For Dummies 2022
No ratings yet
Service Mesh For Dummies 2022
63 pages
Unified Enterprise Datasheet
No ratings yet
Unified Enterprise Datasheet
3 pages
Course Outline Dfn50343 Ent Network - Sesi220232024
No ratings yet
Course Outline Dfn50343 Ent Network - Sesi220232024
4 pages

Lecture 04

Uploaded by

Lecture 04

Uploaded by

CSE-813(Distributed & Cloud Computing)

Dr. Atiqur Rahman

Lecture 4: Failure Detection and

• Do you believe him/her?

• What would be your first responsibility?

Say, the rate of failure of one machine (OS/disk/motherboard/network,

Soft crashes and failures are even more frequent!

1. Hire 1000 people, each to monitor one machine in

• Crash-stop/Fail-stop process failures 5

list: pick random nodes, send it list 3

• Worst case load L* per member in the group

For Fixed : Constant Process Load O(N)

First Detection Time

Process Load • Constant per period

False Positive Rate • Tunable (via K)

Completeness • Deterministic time-bounded

• PM(T) is exponential in -K. Also depends on pml (and

• Ring failure detection underlies

• Gossip-style failure detection underlies

You might also like