Content deleted Content added
m Typo fixing, replaced: ,m → , m (9), ,p → , p (3), halfed → halved |
Typo fixing, replaced: are send → are sent (4), leafs → leaves (2) |
||
Line 41:
==== Runtime analysis ====
A simple analysis for the algorithm uses the BSP-model and incorporates the time <math>T_{start}</math> needed to initiate communication and <math>T_{byte}</math> the time needed to send a byte. Then the resulting runtime is <math>\Theta((T_{start} + n \cdot T_{byte})\cdot log(p))</math>, as <math>m</math> elements of a vector are
== Pipeline-algorithm ==
Line 55:
It is important to note that the send and receive operations have to be executed concurrently for the algorithm to work. The result vector is stored at <math>p_{p-1}</math> at the end. The associated animation shows an execution of the algorithm on vectors of size four with five processing units. Two steps of the animation visualize one parallel execution step. The number of steps in the parallel execution are <math>p + m -2</math>, it takes <math>p-1</math> steps until the last processing unit receives its first element and additional <math>m-1</math> until all elements are received. Therefore, the runtime in the BSP-model is <math>T(n, p, m) = (T_{start} + \frac{n}{m}T_{byte})(p+m-2)</math>, assuming that <math>n</math> is the total byte-size of a vector.
Although <math>m</math> has a fixed value, it is possible to logically group elements of a vector together and reduce <math>m</math>. For example, a problem instance with vectors of size four can be handled by splitting the vectors into the first two and last two elements, which are always transmitted and computed together. In this case, double the volume is
== Pipelined tree ==
Line 62:
=== Algorithm description ===
The animation shows the execution of such an algorithm in a [[Duplex (telecommunications)|full-duplex]] communication model. Communication links are represented by black lines between the vectors of elements and build a Fibonacci tree of size seven in this example. If an element is
The algorithm itself propagates the partial sums from bottom to top until all elements are contained in the sum at the root processor on top. In the first step of the execution, the processing units which are
=== Runtime ===
|