Bully algorithm

This is an old revision of this page, as edited by 37.11.126.80 (talk) at 20:08, 30 June 2015. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The bully algorithm is a programming mechanism that applies a hierachy to nodes on a distributed system making them Coordinator or Slaves. This is used as a method in distributed computing for dynamically electing a coordinator by process ID number. The process with the highest process ID number is selected as the coordinator.

Assumptions

As this algorithm is part from a distributed system model wich tries to make it fail-free (like [1]), we need some assumptions as the whole point of the model is to make a distributed system free from arbitrary failures (not from processing ones) while saving some computational costs.

  • The system is synchronous and uses timeout for identifying process failure. (so you can have Delta and Cmax in order to calculate timeout as opposed to asynchronous systems)
  • Allows processes to crash during execution of algorithm. (To=2*Delta+Cmax so timer knows when omision fails happens)
  • Message delivery between processes should be reliable.(Coordinator dilema,¿is it trustworthy; or suplantation,inyection,replication,DoS may happen?)
  • Prior information about other process id's must be known. (This works as Leslie Lamport solution for Byzantine dilema, where coordinator needs a key and id for each process and where processors hierachy stipulates nodes as Generals, Commanders and Liutenants but without a key and with only coordinators and slaves)

Election type

This is how we apply the Bully-algorithm:

  • Election Message: Sent to announce faster election
  • Answer Message: Respond to the election message
  • Coordinator message: Sent to announce the identity of the elected process

Compare with Ring algorithm:

  • Assumes that system is synchronous
  • Uses timeout to detect process failure/crash
  • Each processor knows which processor has the higher identifier number and communicates with that[1]

When a process P determines that the current coordinator is down because of message timeouts or failure of the coordinator to initiate a handshake, it performs the following sequence of actions:

  1. P broadcasts an election message (inquiry) to all other processes with higher process IDs, expecting an "I am alive" response from them if they are alive.
  2. If P hears from no process with a higher process ID than it, it wins the election and broadcasts victory.
  3. If P hears from a process with a higher ID, P waits a certain amount of time for any process with a higher ID to broadcast itself as the leader. If it does not receive this message in time, it re-broadcasts the election message.
  4. If P gets an election message (inquiry) from another process with a lower ID it sends an "I am alive" message back and starts new elections.

Note that if P receives a victory message from a process with a lower ID number, it immediately initiates a new election. This is how the algorithm gets its name - a process with a higher ID number will bully a lower ID process out of the coordinator position as soon as it comes online.

See also

References

  1. ^ Jean Dollimore, Tim Kindberg, George F. Coulouris, "Distributed systems : concepts and design (Third Edition)," in Distributed systems : concepts and design (Third Edition). Addison-Wesley, 2003.
  • Witchel, Emmett (2005). "Distributed Coordination". Retrieved May 4, 2005.
  • Hector Garcia-Molina, Elections in a Distributed Computing System, IEEE Transactions on Computers, Vol. C-31, No. 1, January (1982) 48-59
  • L. Lamport, R. Shostak, and M. Pease, The Byzantine Generals Problem [1]