Revision as of 22:47, 9 October 2019 edit WereSpielChequers (talk \| contribs) Bureaucrats, Administrators 352,698 edits m Typo fixing, replaced: ,m → , m (9), ,p → , p (3), halfed → halved Tag: AWB ← Previous edit		Revision as of 12:09, 24 December 2019 edit undo John of Reading (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 787,576 edits Typo fixing, replaced: are send → are sent (4), leafs → leaves (2) Tag: AWB Next edit →
Line 41: ==== Runtime analysis ==== A simple analysis for the algorithm uses the BSP-model and incorporates the time <math>T_{start}</math> needed to initiate communication and <math>T_{byte}</math> the time needed to send a byte. Then the resulting runtime is <math>\Theta((T_{start} + n \cdot T_{byte})\cdot log(p))</math>, as <math>m</math> elements of a vector are ~~send~~sent in each iteration and have size <math>n</math> in total. == Pipeline-algorithm == Line 55: It is important to note that the send and receive operations have to be executed concurrently for the algorithm to work. The result vector is stored at <math>p_{p-1}</math> at the end. The associated animation shows an execution of the algorithm on vectors of size four with five processing units. Two steps of the animation visualize one parallel execution step. The number of steps in the parallel execution are <math>p + m -2</math>, it takes <math>p-1</math> steps until the last processing unit receives its first element and additional <math>m-1</math> until all elements are received. Therefore, the runtime in the BSP-model is <math>T(n, p, m) = (T_{start} + \frac{n}{m}T_{byte})(p+m-2)</math>, assuming that <math>n</math> is the total byte-size of a vector. Although <math>m</math> has a fixed value, it is possible to logically group elements of a vector together and reduce <math>m</math>. For example, a problem instance with vectors of size four can be handled by splitting the vectors into the first two and last two elements, which are always transmitted and computed together. In this case, double the volume is ~~send~~sent each step, but the number of steps has roughly halved. It means that the parameter <math>m</math> is halved, while the total byte-size <math>n</math> stays the same. The runtime <math>T(p)</math> for this approach depends on the value of <math>m</math>, which can be optimized if <math>T_{start}</math> and <math>T_{byte}</math> are known. It is optimal for <math>m = \sqrt{\frac{n \cdot (p-2)T_{byte}}{T_{start}}}</math>, assuming that this results in a smaller <math>m</math> that divides the original one. == Pipelined tree == Line 62: === Algorithm description === The animation shows the execution of such an algorithm in a [[Duplex (telecommunications)\|full-duplex]] communication model. Communication links are represented by black lines between the vectors of elements and build a Fibonacci tree of size seven in this example. If an element is ~~send~~sent to another processing unit the link is colored with the color of the corresponding element. An element that is received by a processor is added to the already existing element of same color (at the same index in the vector). The algorithm itself propagates the partial sums from bottom to top until all elements are contained in the sum at the root processor on top. In the first step of the execution, the processing units which are ~~leafs~~leaves in the underlying tree send their first elements to their parent. This is similar to the send operations of the binomial tree algorithm with the key difference that the leaf units each have two more elements which have to be ~~send~~sent and therefore do not become inactive, but can continue to send elements, which is analogous to the pipelined approach and improves efficiency. Processing units that are not ~~leafs~~leaves start to send their elements in order of the indices in the vector once they have received an element from a child. In the example they send green, blue and red elements in this order. If two processors compete to send their elements to the same processor, then the element of the right child is received first. Because of the structure of the Fibonacci tree all processors send or receive elements while the "pipeline" is filled. The pipeline is filled from the point where each unit has received an element and until the leaf units have no more elements to send. === Runtime ===

Reduce (parallel pattern): Difference between revisions