Bully algorithm

The bully algorithm is a programming mechanism that applies a hierachy to nodes on a distributed system making them Coordinator or Slaves. This is used as a method in distributed computing for dynamically electing a coordinator by process ID number. The process with the highest process ID number is selected as the coordinator.

Assumptions

As this algorithm is part from a distributed system model wich tries to make it fail-free (like the solution shown in[1]), we need some assumptions as the whole point of the model is to make a distributed system, free from arbitrary failures (not from processing ones) while saving some computational costs.

The system is synchronous and uses timeout for identifying process failure. (so you can have Delta and Cmax in order to calculate timeout as opposed to asynchronous systems where you can't calculate a timeout and then you can't distinguise between an omision fail on a process or a delay)
Allows processes to crash during execution of algorithm. (To=2*Delta+Cmax so timer knows when omision fails happens)
Message delivery between processes should be reliable.(Coordinator dilema,¿is it trustworthy; or suplantation,inyection,replication,DoS may happen?)
Prior information about other process id's must be known. (This works as Leslie Lamport solution for Byzantine dilema, where coordinator needs a key and id for each process and where processors hierachy stipulates nodes as Generals, Commanders and Liutenants but without a key and with only coordinator and slaves)

Component calls

This are the Bully-algorithm components:

Election Message: Sent to announce faster election
Answer Message: Respond to the election message
Coordinator message: Sent to announce the identity of the elected process

Compare with Ring algorithm:

Assumes that system is synchronous
Uses timeout to detect process failure/crash
Each processor knows which processor has the higher identifier number and communicates with that^[1]

Algorithm structure

When a process P determines that the current coordinator is down because of message timeouts or failure of the coordinator to initiate a handshake, it performs the following sequence of actions:

P broadcasts an election message (inquiry) to all other processes with higher process IDs, expecting an "I am alive" response from them if they are alive.
If P hears from no process with a higher process ID than it, it wins the election and broadcasts victory.
If P hears from a process with a higher ID, P waits a certain amount of time for any process with a higher ID to broadcast itself as the leader. If it does not receive this message in time, it re-broadcasts the election message.
If P gets an election message (inquiry) from another process with a lower ID it sends an "I am alive" message back and starts new elections.

Note that if P receives a victory message from a process with a lower ID number, it immediately initiates a new election. This is how the algorithm gets its name - a process with a higher ID number will bully a lower ID process out of the coordinator position as soon as it comes online.

References

^ Jean Dollimore, Tim Kindberg, George F. Coulouris, "Distributed systems : concepts and design (Third Edition)," in Distributed systems : concepts and design (Third Edition). Addison-Wesley, 2003.

Witchel, Emmett (2005). "Distributed Coordination". Retrieved May 4, 2005.
Hector Garcia-Molina, Elections in a Distributed Computing System, IEEE Transactions on Computers, Vol. C-31, No. 1, January (1982) 48-59
L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem" ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.

[1] Jean Dollimore, Tim Kindberg, George F. Coulouris, "Distributed systems : concepts and design (Third Edition)," in Distributed systems : concepts and design (Third Edition). Addison-Wesley, 2003.

[1]

Bully algorithm

Contents

Assumptions

Component calls

Algorithm structure

See also

References