0% found this document useful (0 votes)

5 views

01-da24-Introduction

The document outlines a course on Distributed Algorithms, focusing on the study of algorithms for distributed systems, their specifications, assumptions, and implementations. It covers historical contributions to distributed algorithms, the importance of reliable communication, and various types of algorithms such as reliable broadcast and consensus. The course emphasizes the distinction between message passing and shared memory algorithms and includes practical applications in modern distributed systems like those used by Google.

Uploaded by

Andrea Grillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

01-da24-Introduction

Uploaded by

Andrea Grillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Distributed Algorithms

Prof R. Guerraoui

Exam 70% + Project 30%

Reference: Book - Springer Verlag
- Introduction to Reliable (and Secure) Distributed Programming –

M. Al-Khawarizmi ~9th century:

inventor of the zero, the decimal
system, Arithmetic and Algebra

A. Turing: one machine to rule them all

3
What is an algorithm?
An ordered set of elementary instructions

All execute on the same Turing machine

Complexity measures the number of

instructions (variables)

Really?
4
In short
We study algorithms for distributed systems

A new way of thinking about algorithms and

their complexity

5
Distributed algorithms
E. Dijkstra (concurrent os)~60’s

L. Lamport: ‘‘a distributed system is one that

stops your application because a machine you
have never heard from crashed’’ ~70’s

J. Gray (transactions) ~70’s

N. Lynch (consensus) ~80’s

6
Warning

• This course is complementary to the course

concurrent algorithms

• We study here message passing based

algorithms whereas «the other» course
focuses on shared memory based
algorithms

7
Overview

(1) Why? Motivation

(2) Where? Between the network and

the application

(3) How? (3.1) Specifications, (3.2)

assumptions, and (3.3) algorithms

8
A distributed system

A C
9
Clients-server

Client A

Client B Server
10
Multiple servers
(genuine distribution – P2P -
decentralization)

Server B

Server A Server C
11
The optimistic view

Concurrency => speed (load-balancing)

Partial failures => high-availability

12
The pessimistic view

§ Concurrency (interleaving) =>

incorrectness

§ Partial failures => incorrectness

13
Distributed algorithms
(Today: Google)

Hundreds of thousands of machines connected

A Google job involves 2000 machines

10 machines go down per day

14
Satoshi Nakamoto
(2008) Nick Szabo

2009: 0.005 $

2016: 600 $

2020: 10000 $

15
Overview

(1) Why? Motivation

(2) Where? Between the network and

the application

(3) How? (3.1) Specifications, (3.2)

assumptions, and (3.3) algorithms

16
Distributed systems

17
Distributed systems

The application needs underlying

services for distributed interaction
The network is not enough
Reliability guarantees (e.g., TCP) are
only offered for communication
among pairs of processes, i.e., one-
to-one communication (client-server)

18
Content of this course

Reliable broadcast
Causal order broadcast
Shared memory
Consensus
Total order broadcast
Atomic commit
Terminating reliable broadcast
……
19
Reliable distributed services
Example 1: reliable broadcast
Ensure that a message sent to a
group of processes is received
(delivered) by all or none
Example 2: atomic commit
Ensure that the processes reach a
common decision on whether to
commit or abort a transaction
20
Underlying services
(1): processes (abstracting computers)

(2): channels (abstracting networks)

(3): failure detectors (abstracting time)

21
Processes
§ The distributed system is made of a finite
set of processes: each process models a
sequential program
§ Processes are denoted p1,..pN or p, q, r
§ Processes have unique identities and
know each other
§ Every pair of processes is connected by a
link through which the processes
exchange messages
22
Processes
A process executes a step at every tick of its
local clock: a step consists of
A local computation (local event) and
message exchanges with other processes
(global event)

23
Processes
The program of a process is made of a finite
set of modules (or components) organized as
a software stack
Modules within the same process interact by
exchanging events
upon event < Event1, att1, att2,..> do
// something
trigger < Event2, att1, att2,..>

24
Modules of a process

indication

request (deliver)
indication

request (deliver)

25
Overview
(1) Why? Motivation

(2) Where? Between the network and

the application

(3) How? (3.1) Specifications, (3.2)

assumptions, and (3.3) algorithms

26
Approach
Specifications: What is the service?
i.e., the problem ~ liveness + safety
Assumptions: What is the model, i.e.,
the power of the adversary?
Algorithms: How do we implement the
service? Where are the bugs (proof)?
What cost (complexity)?

27
Overview
(1) Why? Motivation

(2) Where? Between the network and

the application

(3) How? (3.1) Specifications, (3.2)

assumptions, and (3.3) algorithms

28
Liveness and safety
Safety is a property which states that
nothing bad should happen
Liveness is a property which states
that something good should happen
Any specification can be expressed in
terms of liveness and safety
properties (Lamport and Schneider)

29
Liveness and safety

Example: Tell the truth

Having to say something is liveness

Not lying is safety

30
Specifications
Example 1: reliable broadcast
Ensure that a message sent to a
group of processes is received by all
or none
Example 2: atomic commit
Ensure that the processes reach a
common decision on whether to
commit or abort a transaction
31
Overview

(1) Why? Motivation

(2) Where? Between the network and

the application

(3) How? (3.1) Specifications, (3.2)

assumptions, and (3.3) algorithms

32
Overview

(1) Why? Motivation

(2) Where? Between the network and the
application
(3) How? (3.1) Specifications, (3.2)
assumptions, and (3.3) algorithms
3.2.1 Assumptions on processes and
channels
3.2.2 Failure detection
33
Processes

§ A process either executes the algorithm assigned

to it (steps) or fails
§ Two kinds of failures are mainly considered:

üOmissions: the process omits to send

messages it is supposed to send (distracted)
üArbitrary: the process sends messages it is
not supposed to send (malicious or Byzantine)

34
Processes

Crash-stop: a more specific case of

omissions
A process that omits a message to a
process, omits all subsequent
messages to all processes
(permanent distraction): it crashes

35
Processes

By default, we shall assume a crash-stop

model throughout this course; that is, unless
specified otherwise: processes fail only by
crashing (no recovery)

A correct process is a process that does not

fail (that does not crash)

36
Processes and channels

Processes communicate by message passing

through communication channels

Messages are uniquely identified and the

message identifier includes the sender’s identifier

37
Fair-loss links

FL1. Fair-loss: If a message is sent infinitely

often by pi to pj , and neither pi or pj crashes,
then m is delivered infinitely often by pj
FL2. Finite duplication: If a message m is sent
a finite number of times by pi to pj, m is delivered
a finite number of times by pj
FL3. No creation: No message is delivered unless
it was sent

38
Stubborn links

SL1. Stubborn delivery: if a process pi sends a

message m to a correct process pj, and pi does not
crash, then pj delivers m an infinite number of times
SL2. No creation: No message is delivered unless
it was sent

39
Algorithm (sl)
Implements: StubbornLinks (sp2p).
Uses: FairLossLinks (flp2p).
upon event < sp2pSend, dest, m> do
while (true) do
trigger < flp2pSend, dest, m>;
upon event < flp2pDeliver, src, m> do
trigger < sp2pDeliver, src, m>;
40
Reliable (Perfect) links
Properties
PL1. Validity: If pi and pj are correct,
then every message sent by pi to pj is
eventually delivered by pj
PL2. No duplication: No message is
delivered (to a process) more than once
PL3. No creation: No message is
delivered unless it was sent

41
Algorithm (pl)
Implements: PerfectLinks (pp2p).
Uses: StubbornLinks (sp2p).
upon event < Init> do delivered := Æ;
upon event < pp2pSend, dest, m> do
trigger < sp2pSend, dest, m>;
upon event < sp2pDeliver, src, m> do
if m Ï delivered then
trigger < pp2pDeliver, src, m>;
add m to delivered;
42
Reliable links
We shall assume reliable links (also called
perfect) throughout this course (unless
specified otherwise)

Roughly speaking, reliable links ensure that

messages exchanged between correct
processes are not lost

43
Overview
(1) Why? Motivation
(2) Where? Between the network and the
application
(3) How? (3.1) Specifications, (3.2)
assumptions, and (3.3) algorithms
3.2.1 Processes and links
3.2.2 Failure Detection

44
Failure detection

A failure detector is a distributed oracle

that provides processes with suspicions about
crashed processes
It is implemented using (i.e., it encapsulates)
timing assumptions
According to the timing assumptions, the
suspicions can be accurate or not

45
Failure detection

A failure detector module is defined by events

and properties
Events
Indication: <crash, p>
Properties:
Completeness
Accuracy

46
Failure detection

Perfect:
Strong Completeness: Eventually, every process that
crashes is permanently suspected by every correct process
Strong Accuracy: No process is suspected before it crashes

Eventually Perfect:
Strong Completeness
Eventual Strong Accuracy: Eventually, no correct process is
ever suspected

47
Failure detection

Algorithm:
(1) Processes periodically send heartbeat messages
(2) A process sets a timeout based on worst case round
trip of a message exchange
(3) A process suspects another process if it timeouts
that process
(4) A process that delivers a message from a suspected
process revises its suspicion and doubles its time-out

48
Timing assumptions

Synchronous:
Processing: the time it takes for a process to execute
a step is bounded and known
Delays: there is a known upper bound limit on the
time it takes for a message to be received
Clocks: the drift between a local clock and the global
real time clock is bounded and known
Eventually Synchronous: the timing
assumptions hold eventually
Asynchronous: no assumption
49
Overview

(1) Why? Motivation

(2) Where? Between the network and

the application

(3) How? (3.1) Specifications, (3.2)

assumptions, and (3.3) algorithms

50
Algorithmic
modules of a process
indication

request (deliver)
indication

request (deliver)

51
Algorithms (representation)

p1
m3
m1

p2
m2

52
Algorithms (representation)

p1
m1

p2
m2

p3 crash

53
For every abstraction

(A) We assume a crash-stop system with

a perfect failure detector (fail-stop)
We design algorithms

(B) We try to make a weaker assumption

We revisit the algorithms

54
Content of the course
Reliable broadcast
Causal order broadcast
Shared memory
Consensus
Total order broadcast
Atomic commit
Leader election
Terminating reliable broadcast
View synchronous broadcast

Blockchain Essentials & Dapps
100% (1)
Blockchain Essentials & Dapps
125 pages
Calculus of Variation 1
No ratings yet
Calculus of Variation 1
56 pages
Panasonic TB358K Installation Manual
100% (1)
Panasonic TB358K Installation Manual
2 pages
Distributed Systems: Network Provides One-To-One Communication Primitives
No ratings yet
Distributed Systems: Network Provides One-To-One Communication Primitives
19 pages
Introduction To Concurrent Programming
No ratings yet
Introduction To Concurrent Programming
20 pages
Da Slides
No ratings yet
Da Slides
355 pages
Distributed Algorithm
No ratings yet
Distributed Algorithm
466 pages
Lec 09 S
No ratings yet
Lec 09 S
23 pages
dnp_huyenp (8-13)
No ratings yet
dnp_huyenp (8-13)
132 pages
u4p2
No ratings yet
u4p2
46 pages
Fault System One
No ratings yet
Fault System One
19 pages
Distributed System Lecture 1
No ratings yet
Distributed System Lecture 1
40 pages
Lecture 06 AV-323 Processes
No ratings yet
Lecture 06 AV-323 Processes
45 pages
Chapter I Introduction
No ratings yet
Chapter I Introduction
40 pages
DS Chapter V8.0fault Tolerance
No ratings yet
DS Chapter V8.0fault Tolerance
23 pages
Chapter 2 OS
No ratings yet
Chapter 2 OS
38 pages
Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University
No ratings yet
Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University
46 pages
Chapter 6-Synchronozation
No ratings yet
Chapter 6-Synchronozation
24 pages
Introduction To Reliable and Secure Distributed Programming Slide
No ratings yet
Introduction To Reliable and Secure Distributed Programming Slide
101 pages
Lecture 03 InterprocessCommunication
No ratings yet
Lecture 03 InterprocessCommunication
45 pages
Spin Lecture
No ratings yet
Spin Lecture
46 pages
11 Distributed1
No ratings yet
11 Distributed1
42 pages
3.Synchronization
No ratings yet
3.Synchronization
45 pages
Coordination and Agreement: Distributed Systems
No ratings yet
Coordination and Agreement: Distributed Systems
37 pages
Next chapter os
No ratings yet
Next chapter os
26 pages
Distributed Systems Notes
No ratings yet
Distributed Systems Notes
12 pages
DC(UNIT-3)
No ratings yet
DC(UNIT-3)
12 pages
Lab CPD-2
No ratings yet
Lab CPD-2
35 pages
CS439-CC-2-Parallel Distributed Systems
No ratings yet
CS439-CC-2-Parallel Distributed Systems
37 pages
DistributedSystems Notes
No ratings yet
DistributedSystems Notes
73 pages
Interprocess Communication and Synchronization
No ratings yet
Interprocess Communication and Synchronization
33 pages
Fault
No ratings yet
Fault
101 pages
AOS PPT unit 1,2_20241112_222203_0000
No ratings yet
AOS PPT unit 1,2_20241112_222203_0000
20 pages
UNIT-2-Process Synchronization
No ratings yet
UNIT-2-Process Synchronization
52 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
daa_unit-vi
No ratings yet
daa_unit-vi
50 pages
Chapter 6 - Concurrent Processes & Process Synchronization
No ratings yet
Chapter 6 - Concurrent Processes & Process Synchronization
34 pages
Se342: Distributed Computing: Lecture # 03-b Fundamental Models
No ratings yet
Se342: Distributed Computing: Lecture # 03-b Fundamental Models
26 pages
DS Syllabus Introduction (Reference)
No ratings yet
DS Syllabus Introduction (Reference)
44 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
Distributed Systems - Fault Tolerance
No ratings yet
Distributed Systems - Fault Tolerance
21 pages
DS Chapter 5 Synchronizations (1)
No ratings yet
DS Chapter 5 Synchronizations (1)
34 pages
Threads Revisited: Lecture Notes
No ratings yet
Threads Revisited: Lecture Notes
8 pages
Unit 1 Part 2
No ratings yet
Unit 1 Part 2
37 pages
Basic Concurrency Theory: Hans Henrik Løvengreen
No ratings yet
Basic Concurrency Theory: Hans Henrik Løvengreen
75 pages
Fault Tolerance Notes
No ratings yet
Fault Tolerance Notes
101 pages
Coordination and Agreement: Check Point Threat Extraction Secured This Document
No ratings yet
Coordination and Agreement: Check Point Threat Extraction Secured This Document
18 pages
Chapte Four DS
No ratings yet
Chapte Four DS
37 pages
Distributed Systems
No ratings yet
Distributed Systems
17 pages
Introduction To Distributed Systems: CSE 380 Computer Operating Systems
No ratings yet
Introduction To Distributed Systems: CSE 380 Computer Operating Systems
17 pages
Distributed Systems: Dr. Martin Kleppmann mk428@cst - Cam.ac - Uk
No ratings yet
Distributed Systems: Dr. Martin Kleppmann mk428@cst - Cam.ac - Uk
91 pages
SD 1
No ratings yet
SD 1
66 pages
IntroDistribuetComputing
No ratings yet
IntroDistribuetComputing
41 pages
Chapter 6 Synchronization
No ratings yet
Chapter 6 Synchronization
37 pages
Chapter_8-Fault_Tolerance (1)
No ratings yet
Chapter_8-Fault_Tolerance (1)
37 pages
Lec 1
No ratings yet
Lec 1
37 pages
Logical Time in Asynchronous Systems Email Example: A B A B
No ratings yet
Logical Time in Asynchronous Systems Email Example: A B A B
8 pages
DC IAT
No ratings yet
DC IAT
14 pages
M.Tech Course Distributed Computing
No ratings yet
M.Tech Course Distributed Computing
117 pages
String-Of-Ponies-s.blessing
No ratings yet
String-Of-Ponies-s.blessing
101 pages
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet
Cybersecurity Key Topics: A Field Guide
From Everand
Cybersecurity Key Topics: A Field Guide
Dr. Betina Tagle
No ratings yet
Lab-02 Declarations and Initialization of Data Variables, Data Types, Escape Sequence
No ratings yet
Lab-02 Declarations and Initialization of Data Variables, Data Types, Escape Sequence
4 pages
Encyclopedia of Computer Science and Technology, Second Edition Volume II Laplante All Chapters Instant Download
100% (1)
Encyclopedia of Computer Science and Technology, Second Edition Volume II Laplante All Chapters Instant Download
65 pages
EIL Document On Motor, Panel
100% (1)
EIL Document On Motor, Panel
62 pages
TP Debug Info
No ratings yet
TP Debug Info
17 pages
Kick Tolerance.
No ratings yet
Kick Tolerance.
9 pages
Basic Electrical Engineering
No ratings yet
Basic Electrical Engineering
92 pages
Structural Design: Aashtoware Pavement Me Design™
No ratings yet
Structural Design: Aashtoware Pavement Me Design™
61 pages
Experiment On Basic Concepts: Experiment 4.1 Measurement of Viscosity by Redwood Viscometer
100% (1)
Experiment On Basic Concepts: Experiment 4.1 Measurement of Viscosity by Redwood Viscometer
5 pages
Folds Faults and Joints
0% (1)
Folds Faults and Joints
3 pages
Rist Anti Julia 2020
No ratings yet
Rist Anti Julia 2020
5 pages
Project Phase Report
No ratings yet
Project Phase Report
6 pages
8th Maths Lesson 10 To 12 Eng
No ratings yet
8th Maths Lesson 10 To 12 Eng
2 pages
Successfully Tested Types of Banknote Handling Machine - Customer-Operated Machines
No ratings yet
Successfully Tested Types of Banknote Handling Machine - Customer-Operated Machines
35 pages
H.T Samsung Split Ducto Heat Pump R410a
No ratings yet
H.T Samsung Split Ducto Heat Pump R410a
3 pages
1996 - The Role of Eigenvectors in Aeroelastic Analysis
No ratings yet
1996 - The Role of Eigenvectors in Aeroelastic Analysis
3 pages
Unit Hydrograph Method
No ratings yet
Unit Hydrograph Method
5 pages
Balmer IEPF-4 2019
No ratings yet
Balmer IEPF-4 2019
130 pages
Report
No ratings yet
Report
172 pages
01 Grade 5 LP
100% (1)
01 Grade 5 LP
4 pages
Spectrum Estimation
No ratings yet
Spectrum Estimation
49 pages
Brevini ADB3E01CM Valve Datasheet
No ratings yet
Brevini ADB3E01CM Valve Datasheet
7 pages
(Ebook) Subatomic Physics by Ernest M. Henley, Alejandro Garcia ISBN 9789812700568, 9812700560 pdf download
100% (1)
(Ebook) Subatomic Physics by Ernest M. Henley, Alejandro Garcia ISBN 9789812700568, 9812700560 pdf download
53 pages
Citect With SV
No ratings yet
Citect With SV
18 pages
An Instruction Manual: HTSR Series Blowers
No ratings yet
An Instruction Manual: HTSR Series Blowers
10 pages
Arihant Mathematics Engineering Solved Papers - Watermark
100% (5)
Arihant Mathematics Engineering Solved Papers - Watermark
1,136 pages
Product Handling Guide-Formaldehyde
No ratings yet
Product Handling Guide-Formaldehyde
4 pages
St. Joseph Naggalama S.3 Chemistry Holiday Questions
No ratings yet
St. Joseph Naggalama S.3 Chemistry Holiday Questions
6 pages
TCD 2013, 2200 071 KW TCD 2013 L04 2V Q400 Tier3
No ratings yet
TCD 2013, 2200 071 KW TCD 2013 L04 2V Q400 Tier3
1 page

01-da24-Introduction

Uploaded by

01-da24-Introduction

Uploaded by

Distributed Algorithms

Exam 70% + Project 30%

M. Al-Khawarizmi ~9th century:

A. Turing: one machine to rule them all

All execute on the same Turing machine

Complexity measures the number of

A new way of thinking about algorithms and

L. Lamport: ‘‘a distributed system is one that

J. Gray (transactions) ~70’s

N. Lynch (consensus) ~80’s

• This course is complementary to the course

• We study here message passing based

(1) Why? Motivation

(2) Where? Between the network and

(3) How? (3.1) Specifications, (3.2)

Concurrency => speed (load-balancing)

Partial failures => high-availability

§ Concurrency (interleaving) =>

§ Partial failures => incorrectness

Hundreds of thousands of machines connected

A Google job involves 2000 machines

10 machines go down per day

(1) Why? Motivation

(2) Where? Between the network and

(3) How? (3.1) Specifications, (3.2)

The application needs underlying

(2): channels (abstracting networks)

(3): failure detectors (abstracting time)

(2) Where? Between the network and

(3) How? (3.1) Specifications, (3.2)

(2) Where? Between the network and

(3) How? (3.1) Specifications, (3.2)

Example: Tell the truth

Having to say something is liveness

Not lying is safety

(1) Why? Motivation

(2) Where? Between the network and

(3) How? (3.1) Specifications, (3.2)

(1) Why? Motivation

§ A process either executes the algorithm assigned

üOmissions: the process omits to send

Crash-stop: a more specific case of

By default, we shall assume a crash-stop

A correct process is a process that does not

Processes communicate by message passing

Messages are uniquely identified and the

FL1. Fair-loss: If a message is sent infinitely

SL1. Stubborn delivery: if a process pi sends a

Roughly speaking, reliable links ensure that

A failure detector is a distributed oracle

A failure detector module is defined by events

(1) Why? Motivation

(2) Where? Between the network and

(3) How? (3.1) Specifications, (3.2)

(A) We assume a crash-stop system with

(B) We try to make a weaker assumption

You might also like