Processes and Processors in Distributed Systems
Processes and Processors in Distributed Systems
u
(t)) is the new incoming load between t and t+1.
Nearest neighbor algorithm:
gradient
One of the major issues is to define a
reasonable contour of gradients. The
following is one model. The propagated
pressure of a processor u, p(u), is defined
as
If u is lightly loaded, p(u) = 0
Otherwise, p(u) = 1 + min{p(v)|v A(u)}
Nearest neighbor algorithm:
gradient
12 23
7
18 23
12
8
20
2
0
8 7
9
27 11 6
3 3
2
1 2
1
2
1
0
0
1 2
2
2 1 2
A node is lightly loaded if its load is < 3.
Nearest neighbor algorithm:
dimension exchange
12
8
1
4
6
4
0
8
4
5
2
5
2
6
6
6
3
3
4
5
5
5
4
4
5
7
5
6
4
5
5
Nearest neighbor algorithm:
dimension exchange extension
Fault tolerance
component faults
Transient faults: occur once and then disappear.
E.g. a bird flying through the beam of a microwave
transmitter may cause lost bits on some network. If retry,
may work.
Intermittent faults: occurs, then vanishes, then
reappears, and so on. E.g. A loose contact on a connector.
Permanent faults: continue to exist until the fault is
repaired. E.g. burnt-out chips, software bugs, and disk
head crashes.
System failures
There are two types of processor faults:
1 Fail-silent faults: a faulty processor
just stops and does not respond
2 Byzantine faults: continue to run but
give wrong answers
Synchronous versus
Asynchronous systems
Synchronous systems: a system that has the
property of always responding to a
message within a known finite bound if it
is working is said to be synchronous.
Otherwise, it is asynchronous.
Use of redundancy
There are three kinds of fault tolerance
approaches:
1 Information redundancy: extra bit to recover
from garbled bits.
2 Time redundancy: do again
3 Physical redundancy: add extra
components. There are two ways to organize
extra physical equipment: active replication (use
the components at the same time) and primary
backup (use the backup if one fails).
Fault Tolerance using active
replication
A B C
A3
A2
A1
B3
B2
B1
C3
C2
C1
V1
V2
V3
V4
V5
V6
V7
V8
V9
voter
How much replication is needed?
A system is said to be k fault tolerant if it can
survive faults in k components and still meet its
specifications.
K+1 processors can fault tolerant k fail-stop
faults. If k of them fail, the one left can work. But
need 2k+1 to tolerate k Byzantine faults because
if k processors send out wrong replies, but there
are still k+1 processors giving the correct answer.
By majority vote, a correct answer can still be
obtained.
Fault Tolerance using primary
backup
Client
Backup Primary
1. Request
2. Do work 3. Update
4. Do work
5. Ack 6. Reply
Handling of Processor Faults
Backward recovery checkpoints.
In the checkpointing method, two undesirable
situations can occur:
Lost message. The state of process Pi indicates that it has
sent a message m to process Pj. Pj has no record of
receiving this message.
Orphan message. The state of process Pj is such that it
has received a message m from the process Pi but the state
of the process Pi is such that it has never sent the message
m to Pj.
A strongly consistent set of checkpoints
consist of a set of local checkpoints such
that there is no orphan or lost message.
A consistent set of checkpoints consists of
a set of local checkpoints such that there is
no orphan message.
Orphan message
Pi
Pj
m
failure
Current checkpoint
Current checkpoint
Domino effect
Pj
Pi
failure
Current checkpoint
Synchronous checkpointing
a processor Pi needs to take a checkpoint
only if there is another process Pj that has
taken a checkpoint that includes the receipt
of a message from Pi and Pi has not
recorded the sending of this message.
In this way no orphan message will be
generated.
Asynchronous checkpointing
Each process takes its checkpoints
independently without any coordination.
Hybrid checkpointing
Synchronous checkpoints are established in
a longer period while asynchronous
checkpoints are used in a shorter period.
That is, within a synchronous period there
are several asynchronous periods.
Agreement in Faulty Systems
two-army problem
Two blue armies must reach agreement to attack
a red army. If one blue army attacks by itself it
will be slaughtered. They can only communicate
using an unreliable channel: sending a messenger
who is subject to capture by the red army.
They can never reach an agreement on attacking.
Now assume the communication is perfect
but the processors are not. The classical
problem is called the Byzantine generals
problem. N generals and M of them are
traitors. Can they reach an agreement?
Lamports algorithm
1
4 3
2
1
1
1
2
2
2
x
y
z
4
4
4
N = 4
M = 1
After the first round
1 got (1,2,x,4); 2 got (1,2,y,4);
3 got (1,2,3,4); 4 got (1,2,z,4)
After the second round
1 got (1,2,y,4), (a,b,c,d), (1,2,z,4)
2 got (1,2,x,4), (e,f,g,h), (1,2,z,4)
4 got (1,2,x,4), (1,2,y,4), (i,j,k,l)
Majority
1 got (1,2,_,4); 2 got (1,2,_,4); 4 got (1,2,_,4)
So all the good generals know that 3 is the bad
guy.
Result
If there are m faulty processors, agreement
can be achieved only if 2m+1 correct
processors are present, for a total of 3m+1.
Suppose n=3, m=1. Agreement cannot be
reached.
1
3
2
1
2
1
x
2
y
After the first round
1 got (1,2,x); 2 got (1,2,y); 3 got (1,2,3)
After the second round
1 got (1,2,y), (a,b,c)
2 got (1,2,x), (d,e,f)
No majority. Cannot reach an agreement.
Agreement under different
models
Turek and Shasha considered the following
parameters for the agreement problem.
1. The system can be synchronous (A=1) or asynchronous (A=0).
2. Communication delay can be either bounded (B=1) or unbounded
(B=0).
3. Messages can be either ordered (C=1) or unordered (C=0).
4. The transmission mechanism can be either point-to-point (D=0) or
broadcast (D=1).
A Karnaugh map for the
agreement problem
00 01 11 10
00 0 0 1 0
01 0 0 1 0
11 1 1 1 1
10 0 0 1 1
CD
AB
Minimizing the Boolean function we have the
following expression for the conditions under
which consensus is possible:
AB+AC+CD = True
(AB=1): Processors are synchronous and
communication delay is bounded.
(AC=1): Processors are synchronous and
messages are ordered.
(CD=1): Messages are ordered and the
transmission mechanism is broadcast.
REAL-TIME DISTRIBUTED
SYSTEMS
What is a real-time system?
Real-time programs interact with the
external world in a way that involves time.
When a stimulus appears, the system must
respond to it in a certain way and before a
certain deadline. E.g. automated factories,
telephone switches, robots, automatic stock
trading system.
Distributed real-time systems
structure
Dev
C
Dev
C C
Dev
C
Dev
C
External device
Sensor
Computer
Actuator
Stimulus
An external device generates a stimulus for the
computer, which must perform certain actions
before a deadline.
1. Periodic: a stimulus occurring regularly every T seconds, such as a
computer in a TV set or VCR getting a new frame every 1/60 of a
second.
2. Aperiodic: stimulus that are recurrent, but not regular, as in the
arrival of an aircraft in an air traffic controllers air space.
3. Sporadic: stimulus that are unexpected, such as a device overheating.
Two types of RTS
Soft real-time systems: missing an
occasional deadline is all right.
Hard real-time systems: even a single
missed deadline in a hard real-time system
is unacceptable, as this might lead to loss
of life or an environmental catastrophe.
Design issues
Clock Synchronization - Keep the clocks
in synchrony is a key issue.
Event-Triggered versus Time-Triggered
Systems
Predictability
Fault Tolerance
Language Support
Event-triggered real-time
system
when a significant event in the outside world
happens, it is detected by some sensor, which
then causes the attached CPU to get an interrupt.
Event-triggered systems are thus interrupt driven.
Most real-time systems work this way.
Disadvantage: they can fail under conditions of
heavy load, that is, when many events are
happening at once. This event shower may
overwhelm the computing system and bring it
down, potentially causing problems seriously.
Time-triggered real-time
system
in this kind of system, a clock interrupt occurs
every T milliseconds. At each clock tick sensors
are sampled and actuators are driven. No
interrupts occur other than clock ticks.
T must be chosen carefully. If it too small, too
many clock interrupts. If it is too large, serious
events may not be noticed until it is too late.
An example to show the
difference between the two
Consider an elevator controller in a 100-story
building. Suppose that the elevator is sitting on
the 60
th
floor. If someone pushes the call button
on the first floor, and then someone else pushes
the call button on the 100
th
floor. In an event-
triggered system, the elevator will go down to
first floor and then to 100
th
floor. But in a time-
triggered system, if both calls fall within one
sampling period, the controller will have to make
a decision whether to go up or go down, for
example, using the nearest-customer-first rule.
In summary, event-triggered designs give
faster response at low load but more
overhead and chance of failure at high
load. Time-trigger designs have the
opposite properties and are furthermore
only suitable in a relatively static
environment in which a great deal is known
about system behavior in advance.
Predictability
One of the most important properties of
any real-time system is that its behavior be
predictable. Ideally, it should be clear at
design time that the system can meet all of
its deadlines, even at peak load. It is known
when event E is detected, the order of
processes running and the worst-case
behavior of these processes.
Fault Tolerance
Many real-time systems control safety-critical devices in
vehicles, hospitals, and power plants, so fault tolerance is
frequently an issue.
Primary-backup schemes are less popular because
deadlines may be missed during cutover after the primary
fails.
In a safety-critical system, it is especially important that
the system be able to handle the worst-case scenario. It is
not enough to say that the probability of three components
failing at once is so low that it can be ignored. Fault-
tolerant real-time systems must be able to cope with the
maximum number of faults and the maximum load at the
same time.
Language Support
In such a language, it should be easy to express the work as a
collection of short tasks that can be scheduled independently.
The language should be designed so that the maximum execution
time of every task can be computed at compile time. This requirement
means that the language cannot support general while loops and
recursions.
The language needs a way to deal with time itself.
The language should have a way to express minimum and maximum
delays.
There should be a way to express what to do if an expected event
does not occur within a certain interval.
Because periodic events play an important role, it would be useful to
have a statement of the form: every (25 msec){} that causes the
statements within the curly brackets to be executed every 25 msec.
Real-Time Communication
Cannot use Ethernet because it is not predictable.
Token ring LAN is predictable. Bounded by kn byte
times. K is the machine number. N is a n-byte message .
An alternative to a token ring is the TDMA (Time
Division Multiple Access) protocol. Here traffic is
organized in fixed-size frames, each of which contains n
slots. Each slot is assigned to one processor, which may
use it to transmit a packet when its time comes. In this
way collisions are avoided, the delay is bounded, and each
processor gets a guaranteed fraction of the bandwidth.
Real-Time Scheduling
Hard real time versus soft real time
Preemptive versus nonpreemptive
scheduling
Dynamic versus static
Centralized versus decentralized
Dynamic Scheduling
1. Rate monotonic algorithm:
It works like this: in advance, each task is
assigned a priority equal to its execution
frequency. For example, a task runs every 20
msec is assigned priority 50 and a task run every
100 msec is assigned priority 10. At run time, the
scheduler always selects the highest priority task
to run, preempting the current task if need be.
2.Earliest deadline first algorithm:
Whenever an event is detected, the
scheduler adds it to the list of waiting tasks.
This list is always keep sorted by deadline,
closest deadline first.
3.Least laxity algorithm:
this algorithm first computes for each task
the amount of time it has to spare, called
the laxity. For a task that must finish in 200
msec but has another 150 msec to run, the
laxity is 50 msec. This algorithm chooses
the task with the least laxity, that is, the one
with the least breathing room.
Static Scheduling
The goal is to find an assignment of tasks
to processors and for each processor, a
static schedule giving the order in which
the tasks are to be run.
A comparison of Dynamic
versus Static Scheduling
Static is good for time-triggered design.
1. It must be carefully planned in advance,
with considerable effort going into choosing the
various parameters.
2. In a hard real-time system, wasting
resources is often the price that must be paid to
guarantee that all deadlines will be met.
3. An optimal or nearly optimal schedule can
be derived in advance.
Dynamic is good for event-triggered
design.
1. It does not require as much advance
work, since scheduling decisions are made
on-the-fly, during execution.
2. It can make better use of resources
than static scheduling.
3. No time to find the best schedule.