4.2.1. Problem Definition
4.2.1. Problem Definition
1
UNIT IV
2
TOPICS
4.1. Consensus and Agreement Algorithm
4.1.1. Problem Definition
4.1.2. Overview of Results
4.1.3.Agreement in Failure- free system
4.1.4. Agreement in Synchronous system with failure
4.2. Checkpointing and Rollback Recovery
4.2.1.Introdution
4.2.2. Background and Definitions
4.2.3. Issues in Failure Recovery
4.2.4. Checkpoint based Recovery
4.2.5. Log based roll back Recovery
4.2.6. Coordinated Checkpointing Algorithm
4.2.7. Algorithm for asynchronous check pointing and
Recovery
3
CONSENSUS AND AGREEMENT
4.1.1. Problem Definition
System Model
Agreement problems have been studied
under the following system model:
N processor in the system with at most m of
them being faulty.
Processors can exchange message directly.
Receiver knows the identity of the sender
Communication medium is reliable:
Messages are delivered without error.
5
Failure models:
Some of the processes may be faulty in distributed systems.
A faulty process can behave in any manner allowed by the
failure model assumed.
Some of the well known failure models includes fail-stop,
send omission and receive omission, and Byzantine failures.
Fail stop model: a process may crash in the middle of a
step, which could be the execution of a local operation or
processing of a message for a send or receive event. It may
send a message to only a subset of the destination set
before crashing.
Byzantine failure model: a process may behave
arbitrarily.
The choice of the failure model determines the feasibility
and complexity of solving consensus.
6
Synchronous/asynchronous communication:
If a failure-prone process chooses to send a message to
process but fails, then intended process cannot detect
the non-arrival of the message.
A consensus algorithm is a process that achieves
agreement on a single data value among distributed
processes or systems.
In a synchronous system, a unsent message scenario can
be identified by the intended recipient, at the end of the
round.
The intended recipient can deal with the non-arrival of
the expected message by assuming the arrival of a
message containing some default data, and then
proceeding with the next round of the algorithm.
7
Network connectivity:
The system has full logical connectivity, i.e.,
each process can communicate with any other
by direct message passing.
Sender identification:
A process that receives a message always
knows the identity of the sender process.
When multiple messages are expected from the
same sender in a single round, a scheduling
algorithm is employed that sends these
messages in sub-rounds, so that each message
sent within the round can be uniquely identified.
8
Channel reliability:
The channels are reliable, and only the processes
may fail.
Authenticated vs. non-authenticated messages:
With unauthenticated messages, when a faulty
process relays a message to other processes
When a process receive a message, it has no way
to verify its authenticity. This is known as un
authenticated message or oral message or an
unsigned message
Agreement variable:
The agreement variable may be boolean or
multivalued, and need not be an integer.
This simplifying assumption does not affect the
results for other data types, but helps in the
9 abstraction while presenting the algorithms.
System assumption
10
4.2.1. Problem Definition- Byzantine
Problem
12
4.1.1. Problem Definition- Byzantine Problem-
Consensus problem
14