Bully algorithm

This is an old revision of this page, as edited by 37.11.126.80 (talk) at 20:53, 30 June 2015. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The bully algorithm is a programming mechanism that applies a hierachy to nodes on a distributed system making them Coordinator or Slaves. This is used as a method in distributed computing for dynamically electing a coordinator by process ID number. The process with the highest process ID number is selected as the coordinator.

Assumptions

As this algorithm is part from a distributed system model wich tries to make it fail-free (like the solution shown in[1]), we need some assumptions as the whole point of the model is to make a distributed system, free from arbitrary failures (not from processing ones) while saving some computational costs.

  • The system is synchronous and uses timeout for identifying process failure. (so you can have Delta and Cmax in order to calculate timeout as opposed to asynchronous systems where you can't calculate a timeout and then you can't distinguise between an omision fail on a process or a delay)
  • Allows processes to crash during execution of algorithm. (To=2*Delta+Cmax so timer knows when omision fails happens)
  • Message delivery between processes should be reliable.(Coordinator dilema,¿is it trustworthy; or suplantation,inyection,replication,DoS may happen?)
  • Prior information about other process id's must be known. (This works as Leslie Lamport solution for Byzantine dilema, where coordinator needs a key and id for each process and where processors hierachy stipulates nodes as Generals, Commanders and Liutenants but without a key and with only coordinator and slaves)

Component calls

This are the Bully-algorithm components:

  • Election Message: Sent to announce faster election
  • Answer Message: Respond to the election message
  • Coordinator message: Sent to announce the identity of the elected process

Compare with Ring algorithm:

  • Assumes that system is synchronous
  • Uses timeout to detect process failure/crash
  • Each processor knows which processor has the higher identifier number and communicates with that[1]

Algorithm structure

When a process P determines that the current coordinator is down because of message timeouts or failure of the coordinator to initiate a handshake, it performs the following sequence of actions:

  1. P broadcasts an election message (inquiry) to all other processes with higher process IDs, expecting an "I am alive" response from them if they are alive.
  2. If P hears from no process with a higher process ID than it, it wins the election and broadcasts victory.
  3. If P hears from a process with a higher ID, P waits a certain amount of time for any process with a higher ID to broadcast itself as the leader. If it does not receive this message in time, it re-broadcasts the election message.
  4. If P gets an election message (inquiry) from another process with a lower ID it sends an "I am alive" message back and starts new elections.

Note that if P receives a victory message from a process with a lower ID number, it immediately initiates a new election. This is how the algorithm gets its name - a process with a higher ID number will bully a lower ID process out of the coordinator position as soon as it comes online.

See also

References

  1. ^ Jean Dollimore, Tim Kindberg, George F. Coulouris, "Distributed systems : concepts and design (Third Edition)," in Distributed systems : concepts and design (Third Edition). Addison-Wesley, 2003.
  • Witchel, Emmett (2005). "Distributed Coordination". Retrieved May 4, 2005.
  • Hector Garcia-Molina, Elections in a Distributed Computing System, IEEE Transactions on Computers, Vol. C-31, No. 1, January (1982) 48-59
  • L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem" ACM Transactions on Programming Languages and Systems, Vol. 4, No. 3, July 1982.