Chandra–Toueg consensus algorithm: Difference between revisions

Content deleted Content added
References: fixed url
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Inline}}
 
(27 intermediate revisions by 7 users not shown)
Line 1:
{{inline|date=May 2024}}
{{multiple issues|
The '''Chandra–Toueg consensus algorithm''', published by Tushar Deepak Chandra and Sam Toueg in 1996, is an algorithm for solving [[Consensus (computer science)|consensus]] in a network of unreliable processes equipped with an ''eventually strong'' [[failure detector]]. The failure detector is an abstract version of [[Timeout (computing)|timeouts]]; it signals to each process when other processes may have crashed. An eventually strong failure detector is one that never identifies ''some'' specific goodnon-faulty process as having failed after some initial period of confusion, and, at the same time, eventually identifies ''all'' badfaulty processes as failed. (where Thea algorithmfaulty itselfprocess is similara toprocess thewhich [[Paxoseventually algorithm]],fails whichor alsocrashes reliesand ona failurenon-faulty detectorsprocess never fails). The BothChandra–Toueg consensus algorithm algorithmsassumes assumethat the number of faulty processes, denoted by {{var|f}}, is less than n/2 (i.e. the minority), i.e. it assumes {{var|f}} < {{var|n}}/2, where n is the total number of processes.
{{refimprove|date=October 2011}}
{{tone|date=October 2011}}
}}
The '''Chandra–Toueg consensus algorithm''', published by Tushar Deepak Chandra and Sam Toueg in 1996, is an algorithm for solving [[Consensus (computer science)|consensus]] in a network of unreliable processes equipped with an ''eventually strong'' [[failure detector]]. The failure detector is an abstract version of [[Timeout (computing)|timeouts]]; it signals to each process when other processes may have crashed. An eventually strong failure detector is one that never identifies some specific good process as having failed after some initial period of confusion, and at the same time eventually identifies all bad processes as failed. The algorithm itself is similar to the [[Paxos algorithm]], which also relies on failure detectors. Both algorithms assume the number of faulty processes is less than n/2, where n is the total number of processes.
 
== The algorithm ==
The algorithm proceeds in rounds and uses a rotating coordinator: in each round {{var|r}}, the process whose identity is given by {{var|r}} mod {{var|n}} is chosen as the coordinator. Each process keeps track of its current preferred decision value (initially equal to the input of the process) and the last round where it changed its decision value (the value's [[timestamp]]). The actions carried out in each round are:
 
The algorithm proceeds in rounds and uses a rotating coordinator: in each round r, the process whose identity is given by r mod n is chosen as the coordinator. Each process keeps track of its current preferred decision value (initially equal to the input of the process) and the last round where it changed its decision value (the value's [[timestamp]]). The actions carried out in each round are:
 
# All processes send (r, preference, timestamp) to the coordinator.
# The coordinator waits to receive messages from at least half of the processes (including itself).
## It then chooses as its preference a value with the most recent timestamp among those sent.
# The coordinator sends (r, preference) to all processes.
# Each process waits (1) to receive (r, preference) from the coordinator, or (2) for its failure detector to identify the coordinator as crashed.
## In the first case, it sets its own preference to the coordinator's preference and responds with ack(r).
## In the second case, it sends nack(r) to the coordinator.
# The coordinator waits to receive ack(r) or nack(r) from a majority of processes.
## If it receives ack(r) from a majority, it sends decide(preference) to all processes.
# Any process that receives decide(preference) for the first time sendsrelays decide(preference) to all processes, then decides preference and terminates.
 
Note that this algorithm is used to decide only on one value.
 
== Correctness ==
 
=== Problem definition ===
 
An algorithm which "solves" the consensus problem must ensure the following properties:
 
# termination: all processes decide on a value;
# agreement: all processes decide on the same value; and
# validity: all processes decide on a value that was some process's input value;
 
=== Assumptions ===
 
Before arguing that the Chandra–Toueg consensus algorithm satisfies the three properties above, recall that this algorithm requires {{var|n}} = 2*{{var|f}} + 1 processes, where at most f of which are faulty.
 
Furthermore, note that this algorithm assumes the existence of ''eventually strong failure detector'' (which are accessible and can be used to detect the crash of a node). An eventually strong failure detector is one that ''never'' identifies ''some'' specific non-faulty (or correct) process as having failed, after some initial period of confusion, and, at the same time, eventually identifies ''all'' faulty processes as failed.
 
=== Proof of correctness ===
 
The consensus problem requires termination (all processes decide), validity (all processes decide on a value that was some process's input value) and agreement (all processes decide on the same value). 'Termination'' holds because eventually the failure detector stops suspecting ''some'' non-faulty process p and eventually p becomes the coordinator. If the algorithm has not terminated before this occurs in some round r, then every non-faulty process in round r waits to receive p's preference and responds with ack(r). This allows p to collect enough acknowledgments to send decide(preference), causing every process to terminate. Alternatively, it may be that some faulty coordinator sends decide only to a few processes; but if any of these processes are non-faulty, they echobroadcast the decision to all the remaining processes, causing them to decide and terminate. Validity follows from the fact that every preference starts out as some process's input; there is nothing in the protocol that generates new preferences.
== Why it works ==
 
''Validity'' follows from the fact that every preference starts out as some process's input; there is nothing in the protocol that generates new preferences.
The consensus problem requires termination (all processes decide), validity (all processes decide on a value that was some process's input value) and agreement (all processes decide on the same value). Termination holds because eventually the failure detector stops suspecting some non-faulty process p and eventually p becomes the coordinator. If the algorithm has not terminated before this occurs in some round r, then every non-faulty process in round r waits to receive p's preference and responds with ack(r). This allows p to collect enough acknowledgments to send decide(preference), causing every process to terminate. Alternatively, it may be that some faulty coordinator sends decide only to a few processes; but if any of these processes are non-faulty, they echo the decision to all the remaining processes, causing them to decide and terminate. Validity follows from the fact that every preference starts out as some process's input; there is nothing in the protocol that generates new preferences.
 
''Agreement'' is trickier.potentially the most difficult to achieve. It iscould be possible that a coordinator, in one round r, might send a decide message from some value v that propagates only to a few processes before some other coordinator, in a later round r', sends a decide message for some other value v'. To show that this does not occur, observe that before the first coordinator can send decide(v), it must have received ack(r) from a majority of processes; but, then, when any later coordinator polls a majority of processes, the later majority will overlap the earlier one and v will be the most recent value. So, any two coordinators that send out decide message send out the same value.
 
== References ==
Line 28 ⟶ 48:
{{DEFAULTSORT:Chandra-Toueg consensus algorithm}}
[[Category:Distributed algorithms]]
[[Category:Fault tolerance]]
[[Category:Fault-tolerant computer systems]]