0% found this document useful (0 votes)
14 views

06 Consensus

Uploaded by

helsytran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

06 Consensus

Uploaded by

helsytran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Distributed

Systems
CPSC 5520 Consensus
Kevin Lundeen
Consensus Protocols

• Consensus
• ”Where everyone agrees”
• Used to describe distributed systems behavior on replication, especially in
the face of failures
• Especially state machine replication, i.e., log replication
• Log Replication
• We used a logical clock last week to implement an algorithm that successfully
replicated logs
• Intolerant of failures
• Intolerant of dynamic addition/removal of nodes
• Requires reliable ordered messaging
• O(n2) messages per log write
• We’ll study Raft next to overcome most of these issues (textbook has Paxos)
• Even more robust (handling Byzantine failures) is PBFT
Raft

Author’s recent note on the name:


There's a few reasons we came up with the name Raft:
- It's not quite an acronym, but we were thinking about the words 'reliable', 'replicated',
'redundant', and 'fault-tolerant'.
- We were thinking about logs and what can be built using them.
- We were thinking about the island of Paxos and how to escape it.
As a plus, we were using the randomly generated name Cheesomi in the paper
before we came up with the name Raft in September 2012. The name
appeared just over 100 times in our paper submission back then, so switching
to the shorter name actually helped shrink the paper down quite a bit.

- Diego Ongaro

 We will follow an extended version of the Raft paper presented at


USENIX 2014
Raft Basics §5.1

• Raft cluster
• small number of servers (five is typical)
• can tolerate a minority of servers failing simultaneously (two can fail
simultaneously in five-node Raft cluster)
• Each server is one of three states:
1. Leader – sole leader that handles all client communications
2. Follower – most servers merely respond to RPCs from leader and
candidates
3. Candidate – a server that noticed absence of leader and is trying to elect
itself to be leader
• Each leader leads for a term
• Term numbers are incremented at each new election
• Once elected a candidate becomes sole leader for that term
• All entries for a term are initiated by its leader
Raft Basics §5.1
(continued)
• Elections can result in split vote
• Term ends with no leader (and no entries in log)
• Another election ensues
• Terms act as a logical clock
• Current term is communicated in all RPCs
• if one server’s current term is smaller than the other’s, then it updates its
current term to the larger value
• if a candidate or leader discovers that its term is out of date, it immediately
reverts to follower state
• if a server receives a request with a stale term number, it rejects the request
• Communication is via RPCs
• RequestVote – candidate to all others
• AppendEntries – leader to all followers
• Failed RPCs are retried
• RPCs done in parallel for best performance
Leader Election §5.2

• Raft uses a heartbeat mechanism to trigger leader election


• If a follower fails to hear from leader after election_timeout period, it
transitions to candidate state:
1. Increments current term number
2. Votes for itself
3. Issues RequestVote RPCs to everyone else
4. Once majority of servers grant vote, becomes leader
5. Alternatively, if it fails to get majority within certain period, it restarts the
election (with a new term)
• A candidate may receive messages from:
• A new leader (this leader got the majority), so it becomes a follower
• Another candidate, if the other candidate is more up-to-date, grants vote
• A follower grants at most one vote
• For first candidate that is at least as up-to-date
Up-To-Date

§5.4.1
A is more up-to-date than B iff:
1. A’s current term is greater than B’s, or
2. A and B have the same current term, but A has a longer log than B
Log Replication §5.3

• The paper talks about replicated state machines, so:


• Their “log” consists of both the entries in the server’s volatile internal queue
along with committed entries
• Entries that have been committed have been applied to the local state
machine; uncommitted entries have not
• Comparing it to how we talked about log replication last week, our “red line”
indicated which entries had been committed (those above the red line) and
those that were still in flux. Raft is the same, but they call the combined list
the log and distinguish the committed entries.
• Each entry has a term number and an index
• Term number was the current term of the leader at the time the client asked
the leader to apply it
• Index is the log number (starting from the beginning of the system)
Log Replication §5.3
(continued)
• Entry index
increments for
every new entry
• Term number
increments for
every election
Log Replication §5.3
(continued)
• A new leader
gets all the
followers in sync
with itself
• A leader sends
new entries to
the followers
• A leader commits
its new entries
when they have
been replicated
to a majority of
followers
The End

You might also like