Content deleted Content added
See also Hypercube internetwork topology |
|||
(21 intermediate revisions by 13 users not shown) | |||
Line 1:
== Algorithm
Most of the communication primitives presented in this article share a common template.<ref>Foster, I.(1995). Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison Wesley; {{ISBN
'''Input:''' message <math>m</math>.
Line 11:
<math>y := i \text{ XOR } 2^k</math>
'''Send''' <math>s</math> '''to''' <math>y</math>
'''
'''Operation'''<math>(s, m)</math>
'''endfor'''
'''Output'''
Each processing element iterates over its neighbors (the expression <math>i \text{ XOR } 2^k</math> negates the <math>k</math>-th bit in <math>i</math>'s binary representation, therefore obtaining the numbers of its neighbors).
[[File:Hypergraph Communication Pattern.png|thumb|Algorithm outline applied to the <math>3</math>-dimensional hypercube. In the first step (before any communication), each processing element possesses one message (blue). Communication is marked red. After each step, the processing elements store the received message, but other operations are also possible.]]
== Communication
=== Prefix sum ===
'''Input''': message <math>m_i</math> of processor <math>i</math>.
'''Output''': prefix sum <math>\bigoplus_{0 \le j \le i} m_j</math> of processor <math>i</math>.
<math>x := m_i</math>
<math>\sigma := m_i</math>
Line 38:
The algorithm works as follows. Observe that hypercubes of dimension <math>d</math> can be split into two hypercubes of dimension <math>d - 1</math>. Refer to the sub cube containing nodes with a leading 0 as the 0-sub cube and the sub cube consisting of nodes with a leading 1 as 1-sub cube. Once both sub cubes have calculated the prefix sum, the sum over all elements in the 0-sub cube has to be added to the every element in the 1-sub cube, since every processing element in the 0-sub cube has a lower rank than the processing elements in the 1-sub cube. The pseudo code stores the prefix sum in variable <math>x</math> and the sum over all nodes in a sub cube in variable <math>\sigma</math>.
This makes it possible for all nodes in 1-sub cube to receive the sum over the 0-sub cube in every step.
This results in a factor of <math>\log p</math> for <math>T_\text{start}</math> and a factor of <math>n\log p</math> for <math>T_\text{byte}</math>: <math>T(n,p) = (T_\text{start} + nT_\text{byte})\log p</math>.
[[File:Hypergraph Communication Steps for Prefix Sum.png|thumb|Example for a prefix sum calculation. Upper number:
=== All-
'''All-
'''Input''': message <math>x := m_i</math> at processing unit <math>i</math>.
'''Output''': all messages <math>m_1 \cdot m_2 \dots m_p</math>.
<math>x := m_i</math>
Line 55 ⟶ 56:
'''endfor'''
With each iteration, the transferred message doubles in length. This leads to a
The same principle can be applied to the '''All-Reduce''' operations, but instead of concatenating the messages, it performs
=== All-to-
Here every processing element has a unique message for all other processing elements.
'''Input:''' message <math>m_{ij}</math> at processing element <math>i</math> to processing element <math>j</math>.
'''for''' <math>d > k \geq 0</math> '''do'''
'''Receive''' from processing element <math>i \text{ XOR } 2^k</math>:
all messages for my <math>k</math>-dimensional sub cube
'''Send''' to processing element <math>i \text{ XOR } 2^k</math>:
all messages for its <math>k</math>-dimensional sub cube
'''endfor'''
With each iteration a messages comes closer to its destination by one dimension, if it hasn't arrived yet.
This results in a run-time of <math>T(n,p) \approx \log{p} (T_\text{start} + \frac{p}{2}nT_\text{byte})</math>.
== ESBT-
The ESBT-broadcast (Edge-disjoint Spanning Binomial Tree) algorithm<ref>{{cite journal|last1=Johnsson|first1=S.L.|last2=Ho|first2=C.-T.|title=Optimum broadcasting and personalized communication in hypercubes|journal=IEEE Transactions on Computers|volume=38|issue=9|year=1989|pages=1249–1268|issn=
=== Runtime ===
=== Construction of the
[[File:HypergraphESBT.png|thumb|A <math>3</math>-dimensional hypercubes with three ESBT embedded.]]
This section describes how to construct the binomial trees systematically. First, construct a single binomial spanning tree von <math>2^d</math> nodes as follows. Number the nodes from <math>0</math> to <math>2^d - 1</math> and consider their binary representation. Then the children of each nodes are obtained by negating single leading zeroes. This results in a single binomial spanning tree. To obtain <math>d</math> edge-disjoint copies of the tree, translate and rotate the nodes: for the <math>k</math>-th copy of the tree, apply a XOR operation with <math>2^k</math> to each node.
== See also ==
* [[Hypercube internetwork topology]]
==References==
{{Reflist}}
[[Category:Parallel computing]]
|