Chandra–Toueg consensus algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:48, 14 January 2019 edit Nbro (talk \| contribs) Extended confirmed users 3,319 edits →Why it works ← Previous edit		Latest revision as of 16:38, 5 May 2024 edit undo AnomieBOT (talk \| contribs) Bots 6,855,530 edits m Dating maintenance tags: {{Inline}}
(11 intermediate revisions by 5 users not shown)
Line 1: {{inline\|date=May 2024}} The '''Chandra–Toueg consensus algorithm''', published by Tushar Deepak Chandra and Sam Toueg in 1996, is an algorithm for solving [[Consensus (computer science)\|consensus]] in a network of unreliable processes equipped with an ''eventually strong'' [[failure detector]]. The failure detector is an abstract version of [[Timeout (computing)\|timeouts]]; it signals to each process when other processes may have crashed. An eventually strong failure detector is one that never identifies ''some'' specific non-faulty process as having failed after some initial period of confusion, and, at the same time, eventually identifies ''all'' faulty processes as failed (where a faulty process is a process which eventually fails or crashes and a non-faulty process never fails). The Chandra–Toueg consensus algorithm assumes that the number of faulty processes, denoted by {{var\|f}}, is less than n/2 (i.e. the minority), i.e. it assumes {{var\|f}} < {{var\|n}}/2, where n is the total number of processes. == The algorithm == The algorithm proceeds in rounds and uses a rotating coordinator: in each round {{var\|r}}, the process whose identity is given by {{var\|r}} mod {{var\|n}} is chosen as the coordinator. Each process keeps track of its current preferred decision value (initially equal to the input of the process) and the last round where it changed its decision value (the value's [[timestamp]]). The actions carried out in each round are: # All processes send (r, preference, timestamp) to the coordinator. Line 13 ⟶ 14: # The coordinator waits to receive ack(r) or nack(r) from a majority of processes. ## If it receives ack(r) from a majority, it sends decide(preference) to all processes. # Any process that receives decide(preference) for the first time ~~sends~~relays decide(preference) to all processes, then decides preference and terminates. Note that this algorithm is used to decide only on one value. == Correctness == ~~The consensus problem requires:~~ === Problem definition === # termination (all processes decide); ▼ # validity (all processes decide on a value that was some process's input value); and ▼ An algorithm which "solves" the consensus problem must ensure the following properties: # agreement (all processes decide on the same value). ▼ ▲# termination: (all processes decide); on a value; ▲# agreement: (all processes decide on the same value). ; and ▲# validity: (all processes decide on a value that was some process's input value); ~~and~~ === Assumptions === Before arguing that the Chandra–Toueg consensus algorithm satisfies the three properties above, recall that this algorithm requires {{var\|n}} = 2*{{var\|f}} + 1 processes, where at most f of which are faulty. ~~Before~~Furthermore, ~~arguing~~note that ~~the~~this algorithm ~~above satisfies~~assumes the ~~3 properties above, we recall the definition~~existence of an ''eventually strong failure detector'' (which are accessible and can be used to detect the crash of a node). An eventually strong failure detector is one that ''never'' identifies ''some'' specific non-faulty (or correct) process as having failed, after some initial period of confusion, and, at the same time, eventually identifies ''all'' faulty processes as failed.▼ === Proof of correctness === ▲Before arguing that the algorithm above satisfies the 3 properties above, we recall the definition of an ''eventually strong failure detector''. An eventually strong failure detector is one that ''never'' identifies ''some'' specific non-faulty (or correct) process as having failed, after some initial period of confusion, and, at the same time, eventually identifies ''all'' faulty processes as failed. ''Termination'' holds because eventually the failure detector stops suspecting ''some'' non-faulty process p and eventually p becomes the coordinator. If the algorithm has not terminated before this occurs in some round r, then every non-faulty process in round r waits to receive p's preference and responds with ack(r). This allows p to collect enough acknowledgments to send decide(preference), causing every process to terminate. Alternatively, it may be that some faulty coordinator sends decide only to a few processes; but if any of these processes are non-faulty, they broadcast the decision to all the remaining processes, causing them to decide and terminate. Line 36 ⟶ 48: {{DEFAULTSORT:Chandra-Toueg consensus algorithm}} [[Category:Distributed algorithms]] ~~[[Category:Fault tolerance]]~~ [[Category:Fault-tolerant computer systems]]