Hypercube (communication pattern): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:03, 29 June 2019 edit McGucket (talk \| contribs) 130 edits Improve language. ← Previous edit		Latest revision as of 05:47, 17 February 2025 edit undo TheDragonFire (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 8,433 edits See also Hypercube internetwork topology Tag: 2017 wikitext editor
(3 intermediate revisions by 3 users not shown)
Line 1: <math>d</math>-dimensional ~~'''[[~~hypercube~~]]'''~~ is a network topology for parallel computers with <math>2^d</math> processing elements. The topology allows for an efficient implementation of some basic communication primitives such as [[Broadcasting (networking)\|Broadcast]], All-[[Reduce (parallel pattern)\|Reduce]], and [[Prefix sum]].<ref>Grama, A.(2003). Introduction to Parallel Computing. Addison Wesley; Auflage: 2 ed. {{ISBN\|978-0201648652}}.</ref> The processing elements are numbered <math>0</math> through <math>2^d - 1</math>. Each processing element is adjacent to processing elements whose numbers differ in one and only one bit. The algorithms described in this page utilize this structure efficiently.▼ ~~<!-- Please do not remove or change this AfD message until the discussion has been closed. -->~~ ~~{{Article for deletion/dated\|page=Hypercube (communication pattern)\|timestamp=20190628213446\|year=2019\|month=June\|day=28\|substed=yes}}~~ ~~<!-- Once discussion is closed, please place on talk page: {{Old AfD multi\|page=Hypercube (communication pattern)\|date=28 June 2019\|result='''keep'''}} -->~~ ~~<!-- End of AfD message, feel free to edit beyond this point -->~~ == Algorithm ~~Outline~~outline ==▼ ▲<math>d</math>-dimensional '''[[hypercube]]''' is a network topology for parallel computers with <math>2^d</math> processing elements. The topology allows for an efficient implementation of some basic communication primitives such as [[Broadcasting (networking)\|Broadcast]], All-[[Reduce (parallel pattern)\|Reduce]], and [[Prefix sum]].<ref>Grama, A.(2003). Introduction to Parallel Computing. Addison Wesley; Auflage: 2 ed. {{ISBN\|978-0201648652}}.</ref> The processing elements are numbered <math>0</math> through <math>2^d - 1</math>. Each processing element is adjacent to processing elements whose numbers differ in one and only one bit. The algorithms described in this page utilize this structure efficiently. ▲== Algorithm Outline == Most of the communication primitives presented in this article share a common template.<ref>Foster, I.(1995). Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison Wesley; {{ISBN\|0201575949}}.</ref> Initially, each processing element possesses one message that must reach every other processing element during the course of the algorithm. The following pseudo code sketches the communication steps necessary. Hereby, '''Initialization''', '''Operation''', and '''Output''' are placeholders that depend on the given communication primitive (see next section). Line 48 ⟶ 43: [[File:Hypergraph Communication Steps for Prefix Sum.png\|thumb\|Example for a prefix sum calculation. Upper number: tentatetive prefix sum (variable <math>x</math>). Lower number: sum over all elements in the sub cube (variable <math>\sigma</math>).]] === All-~~Gather~~gather / ~~All~~all-~~Reduce~~reduce === '''All-~~Gather~~gather''' operations start with each processing element having a message <math>m_i</math>. The goal of the operation is for each processing element to know the messages of all other processing elements, i.e. <math>x := m_0 \cdot m_1 \dots m_p</math> where <math>\cdot</math> is concatenation. The operation can be implemented following the algorithm template. '''Input''': message <math>x := m_i</math> at processing unit <math>i</math>. Line 65 ⟶ 60: The same principle can be applied to the '''All-Reduce''' operations, but instead of concatenating the messages, it performs a reduction operation on the two messages. So it is a '''Reduce''' operation, where all processing units know the result. Compared to a normal reduce operation followed by a broadcast, All-Reduce in hypercubes reduces the number of communication steps. === All-to-~~All~~all === Here every processing element has a unique message for all other processing elements. '''Input:''' message <math>m_{ij}</math> at processing element <math>i</math> to processing element <math>j</math>. '''for''' <math>d > k \geq 0</math> '''do''' '''Receive''' from processing element <math>i \text{ XOR } 2^k</math>: all messages for my <math>k</math>-dimensional sub cube '''Send''' to processing element <math>i \text{ XOR } 2^k</math>: all messages for its <math>k</math>-dimensional sub cube '''endfor''' Line 80 ⟶ 75: This results in a run-time of <math>T(n,p) \approx \log{p} (T_\text{start} + \frac{p}{2}nT_\text{byte})</math>. == ESBT-~~Broadcast~~broadcast == The ESBT-broadcast (Edge-disjoint Spanning Binomial Tree) algorithm<ref>{{cite journal\|last1=Johnsson\|first1=S.L.\|last2=Ho\|first2=C.-T.\|title=Optimum broadcasting and personalized communication in hypercubes\|journal=IEEE Transactions on Computers\|volume=38\|issue=9\|year=1989\|pages=1249–1268\|issn=0018-9340\|doi=10.1109/12.29465}}</ref> is a pipelined broadcast algorithm with optimal runtime for clusters with hypercube network topology. The algorithm embeds <math>d</math> edge-disjoint binomial trees in the hypercube, such that each neighbor of processing element <math>0</math> is the root of a spanning binomial tree on <math>2^d - 1</math> nodes. To broadcast a message, the source node splits its message into <math>k</math> chunks of equal size and cyclically sends them to the roots of the binomial trees. Upon receiving a chunk, the binomial trees broadcast it. Line 86 ⟶ 81: In each step, the source node sends one of its <math>k</math> chunks to a binomial tree. Broadcasting the chunk within the binomial tree takes <math>d</math> steps. Thus, it takes <math>k</math> steps to distribute all chunks and additionally <math>d</math> steps until the last binomial tree broadcast has finished, resulting in <math>k + d</math> steps overall. Therefore, the runtime for a message of length <math>n</math> is <math>T(n, p, k) = \left(\frac{n}{k} T_\text{byte} + T_\text{start} \right) (k + d)</math>. With the optimal chunk size <math>k^* = \sqrt{\frac{nd \cdot T_\text{byte}}{T_\text{start}}}</math>, the optimal runtime of the algorithm is <math>T^(n, p) = n \cdot T_\text{byte} + \log(p) \cdot T_\text{start} + \sqrt{n \log(p) \cdot T_\text{start} \cdot T_\text{byte}}</math>. === Construction of the ~~Binomial~~binomial ~~Trees~~trees === [[File:HypergraphESBT.png\|thumb\|A <math>3</math>-dimensional hypercubes with three ESBT embedded.]] This section describes how to construct the binomial trees systematically. First, construct a single binomial spanning tree von <math>2^d</math> nodes as follows. Number the nodes from <math>0</math> to <math>2^d - 1</math> and consider their binary representation. Then the children of each nodes are obtained by negating single leading zeroes. This results in a single binomial spanning tree. To obtain <math>d</math> edge-disjoint copies of the tree, translate and rotate the nodes: for the <math>k</math>-th copy of the tree, apply a XOR operation with <math>2^k</math> to each node. Subsequently, right-rotate all nodes by <math>k</math> digits. The resulting binomial trees are edge-disjoint and therefore fulfill the requirements for the ESBT-broadcasting algorithm. == See also == [[Hypercube internetwork topology]] ==References==