Raft (algorithm): Difference between revisions

Content deleted Content added
Basics: pleonasm
Adding details on leader election in Raft. Adding Safety problem. Still super incomplete and require lots of future modifications incoming from me
Line 2:
 
== Basics ==
Raft achieves consensus via an elected leader. A server in a raft cluster is either a ''leader,'' or a candidate''follower'', orand can be a follower''candidate'' in the precise case of an election (leader unavailable) . The leader is responsible for log replication to the followers. It regularly informs the followers of its existence by sending a heartbeat message. Each follower has a timeout (typically between 150 and 300 ms) in which it expects the heartbeat from the leader. The timeout is reset on receiving the heartbeat. If no heartbeat is received the follower changes its status to candidate and starts a leader election.<ref name="paper"/><ref name="secretlives"/>
 
=== Approach of the consensus problem in Raft ===
=== Leader Election ===
Raft implements consensus by a leader approach. The server has one and only one elected leader which is fully responsible of managing log replication on the other servers of the cluster. It means that the leader can decide of new entries placement and establishment of data flow between him and the other servers without consulting other servers. A leader leads until it fails or disconnect, in which case a new leader is elected.
A leader election is started by a candidate server. A server becomes a candidate if it receives no heartbeat from the leader within the timeout. It starts the election by increasing the term counter and sending a RequestVote message to all other servers. The other servers will vote for the first candidate that sends them a RequestVote message. A server will only vote once per term. If the candidate receives a message from a leader with a term number equal to or larger than the current term, then its election is defeated and the candidate changes into a follower. If a candidate receives a majority of votes, then it becomes the new leader. If neither happens, e.g., because of a split vote, then a new leader election is started after a timeout.<ref name="paper"/><ref name="secretlives"/>
 
The consensus problem is decomposed in Raft into three relatively independent subproblems listed down below.
The timeout values of each server should be spread out within a reasonable interval. This should reduce the chance of a split vote because servers won't become candidates at the same time.<ref name="paper"/>
 
==== LogLeader ReplicationElection ====
When the existing leader fails or when you start your algorithm, a new leader needs to be elected.
 
In this case, a new ''term'' starts in the cluster. A term is an arbitrary period of time on the server on which no new leader needs to be elected. Each term starts with a leader election. If the election is completed successfully (i. e. a single leader is elected) the term keeps going with normal operations orchestrated by the new leader. If the election is a failure, a new term starts, with a new election.
 
A leader election is started by a ''candidate'' server. A server becomes a candidate if it receives no heartbeatcommunication fromby the leader withinover a period called the ''election timeout'', so it assumes there is no acting leader anymore. It starts the election by increasing the term counter, vote for himself as new leader, and sendingsend a RequestVote message to all other servers. Therequesting other servers willthere vote for the first candidate that sends them a RequestVote message. A server will vote only vote once per term, on a first-come-first-served basis. If thea candidate receives a message from aanother leaderserver with a term number equalat toleast oras largerlarge thanas the candidate's current term, then its election is defeated and the candidate changes into a follower and recognize the leader as legitimate. If a candidate receives a majority of votes, then it becomes the new leader. If neither happens, e.g., because of a split vote, then a new leaderterm electionstarts, isand starteda afternew aelection timeoutbegins.<ref name="paper"/><ref name="secretlives"/>
 
Raft uses randomized election timeout to ensure that split votes problem are resolved quickly. This should reduce the chance of a split vote because servers won't become candidates at the same time : a single server will timeout, win the election, then become leader and sends heartbeat messages to other servers before any of the followers can become candidate.<ref name="paper" />
 
==== Log Replication ====
The leader is responsible for the log replication. It accepts client requests. The requests are forwarded to the followers in AppendEntries messages. Once the leader receives confirmation from the majority of its followers the request is considered committed.<ref name="paper"/><ref name="secretlives"/>
 
==== Safety ====
Raft guarantees each of these safety properties :
* '''Election safety:''' at most on leader can be elected in a given term.
* '''Leader Append-Only:''' a leader can only appends new entries to its logs (it can't overwrite neither delete entries).
* '''Log Matching:''' if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index.
* '''Leader Completeness:''' if a log entry is committed in a given term then it will be present in the logs of the leaders since this term
* '''State Machine Safety:''' if a server has applied a particular log entry to its state machine, then no other server may apply a different command for the same log.
 
== References ==