Bully algorithm: Difference between revisions

Content deleted Content added
BOT--Reverting link addition(s) by Patukos to revision 669392296 (http://i61.tinypic.com/23mwtvc.jpg [tinypic\.com])
 
(65 intermediate revisions by 29 users not shown)
Line 1:
In [[distributed computing]], the '''bully algorithm''' is a method for dynamically [[Leader election|electing]] a [[Distributed computing#Election|coordinator]] or leader from a group of distributed computer processes. The process with the highest process ID number from amongst the non-failed processes is selected as the coordinator.
{{Multiple issues|
{{Confusing|reason=is it unclear what is non-trivial about this algorithm|date=January 2015}}
{{Confusing|reason=is the communication model missing, is it all-to-all communication?|date=January 2015}}
{{Contradict|about=the assumption of being synchronous (having a rounds for all processes) and using timeouts|date=January 2015}}
{{Confusing|reason=is the section about "Election Type" a mix between an algorithm description and a comparison with another algorithm (which one even?) |date=January 2015}}
}}
 
The '''bully algorithm''' is a programming mechanism that applies a hierachy to nodes on a system, making a process coordinator or slave.
This is used as a method in [[distributed computing]] for dynamically electing a [[Distributed Computing#Coordinator Election|coordinator]] by process ID number. The process with the highest process ID number is selected as the [[Distributed Computing#Coordinator Election|coordinator]].
 
==Assumptions==
 
The algorithm assumes that:<ref>{{cite book |last1=Coulouris |first1=George |last2=Dollimore |first2=Jean |last3=Kindberg |first3=Tim |title=Distributed Systems: Concepts and Design |date=2000 |publisher=Addison Wesley |isbn=978-0201619188 |edition=3rd}}</ref>
As this algorithm is part from a system model that tries to make a fail-free system (like the solution shown in Lamport paper), we need some assumptions for the model.
* Assumes thatthe system is synchronous .
* processes may fail at any time, including during execution of the algorithm.
* a process fails by stopping and returns from failure by restarting.
* there is a failure detector which detects failed processes.
* message delivery between processes is reliable.
* each process knows its own process id and address, and that of every other process.
 
==Algorithm==
* The system is synchronous and uses timeout for identifying process failure. (so you can have Delta and Cmax in order to calculate timeout as opposed to asynchronous systems where you can't calculate a timeout and then you can't distinguise between an omision fail on a process or a delay)
The [[algorithm]] uses the following message types:
* Allows processes to crash during execution of algorithm. (To=2*Delta+Cmax; so timer knows when omision fails happens)
* Election Message: Sent to announce faster election.
* Message delivery between processes should be reliable.(Coordinator dilema,¿is it trustworthy; or suplantation,inyection,replication,DoS may happen?)
* Answer (Alive) Message: RespondResponds to the electionElection message.
* Prior information about other process id's must be known. (This works as Leslie Lamport solution for Byzantine dilema, where coordinator needs a key and id for each process and where processors hierachy stipulates nodes as Generals, Commanders and Liutenants but without a key and with only coordinator and slaves)
* Coordinator (Victory) Message: Sent by winner of the election to announce victory.
 
When a process {{var|P}} recovers from failure, or the failure detector indicates that the current coordinator has failed, {{var|P}} performs the following actions:
Notice that this algorithm can be aplied over distributed or centrilized systems , because processes can be located on one machine or over severals as you can make multicast calls or system calls or both if your system is hybrid (for example a multithread server working with several clients)
 
# If {{var|P}} has the highest process ID, it sends a Victory message to all other processes and becomes the new Coordinator. Otherwise, {{var|P}} broadcasts an Election message to all other processes with higher process IDs than itself.
==Component calls==
# If {{var|P}} receives no Answer after sending an Election message, then it broadcasts a Victory message to all other processes and becomes the Coordinator.
# If {{var|P}} receives an Answer from a process with a higher ID, it sends no further messages for this election and waits for a Victory message. (If there is no Victory message after a period of time, it restarts the process at the beginning.)
# If {{var|P}} receives an Election message from another process with a lower ID it sends an Answer message back and if it has not already started an election, it starts the election process at the beginning, by sending an Election message to higher-numbered processes.
# If {{var|P}} receives a Coordinator message, it treats the sender as the coordinator.
 
===Analysis===
This are the Bully-algorithm components:
 
====Safety====
* Election Message: Sent to announce faster election
The safety property expected of [[leader election]] protocols is that every non-faulty process either elects a process {{var|Q}}, or elects none at all. Note that all [[process (computing)|processes]] that elect a leader must decide on the same process {{var|Q}} as the leader. The Bully algorithm satisfies this property (under the system model specified), and at no point in time is it possible for two processes in the group to have
* Answer Message: Respond to the election message
a conflicting view of who the leader is, except during an election. This is true because if it weren't, there are two processes {{var|X}} and {{var|Y}} such that both sent the Coordinator (victory) message to the group. This means {{var|X}} and {{var|Y}} must also have sent each other victory messages. But this cannot happen, since before sending the victory message, Election messages would have been exchanged between the two, and the process with a lower process ID among the two would never send out victory messages. We have a contradiction, and hence our initial assumption that there are two leaders in the system at any given time is false, and that shows that the bully algorithm is safe.
* Coordinator message: Sent to announce the identity of the elected process
 
====Liveness====
Compared with Ring election algorithm:
[[Liveness]] is also guaranteed in the [[synchronous]], crash-recovery model. Consider the would-be leader failing after sending an Answer (Alive) message but before sending a Coordinator (victory) message. If it does not recover before the set timeout on lower ID processes, one of them will become leader eventually (even if some of the other processes crash). If the failed process recovers in time, it simply sends a Coordinator (victory) message to all of the group.
* Assumes that system is synchronous
* Uses timeout to detect process failure/crash
* Each processor knows which processor has the higher identifier number and communicates with that<ref>Jean Dollimore, Tim Kindberg, George F. Coulouris, "Distributed systems : concepts and design (Third Edition)," in ''Distributed systems : concepts and design (Third Edition)''. Addison-Wesley, 2003.</ref>
 
====Network bandwidth utilization====
==Bully algorithm structure==
{{see also|network bandwidth}}
 
Assuming that the bully algorithm messages are of a fixed (known, invariant) sizes, the most number of messages are exchanged in the group when the process with the lowest ID initiates an election. This process sends (N−1) Election messages, the next higher ID sends (N−2) messages, and so on, resulting in <math>\Theta\left(N^2\right)</math> election messages. There are also the <math>\Theta\left(N^2\right)</math> Alive messages, and <math>\Theta\left(N\right)</math> co-ordinator messages, thus making the overall number messages exchanged in the worst case be <math>\Theta\left(N^2\right)</math>.
When a process P determines that the current coordinator is down because of message timeouts or failure of the coordinator to initiate a handshake, it performs the following sequence of actions:
 
# P broadcasts an election message (inquiry) to all other processes with higher process IDs, expecting an "I am alive" response from them if they are alive.
# If P hears from no process with a higher process ID than it, it wins the election and broadcasts victory.
# If P hears from a process with a higher ID, P waits a certain amount of time for any process with a higher ID to broadcast itself as the leader. If it does not receive this message in time, it re-broadcasts the election message.
# If P gets an election message (inquiry) from another process with a lower ID it sends an "I am alive" message back and starts new elections.
Note that if P receives a victory message from a process with a lower ID number, it immediately initiates a new election. This is how the algorithm gets its name - a process with a higher ID number will bully a lower ID process out of the coordinator position as soon as it comes online.
 
== See also ==
*[[Distributed Computing#CoordinatorLeader election]]
*[[Chang and Roberts algorithm]]
 
Line 50 ⟶ 45:
{{reflist}}
* Witchel, Emmett (2005). [http://www.cs.utexas.edu/users/witchel/372/lectures/25.DistributedCoordination.ppt "Distributed Coordination"]. Retrieved May 4, 2005.
* Hector Garcia-Molina, Elections in a Distributed Computing System, IEEE Transactions on Computers, Vol. C-31, No. 1, January (1982) 48-5948–59
* L. Lamport, R. Shostak, and M. Pease, [http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf "The Byzantine Generals Problem"] ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.
 
==External links==
*{{Commonscatinline}}
[[Category:Distributed algorithms]]
[[Category:Graph algorithms]]