Content deleted Content added
m Remove blank line(s) between list items per WP:LISTGAP to fix an accessibility issue for users of screen readers. Do WP:GENFIXES and cleanup if needed. Discuss this at Wikipedia talk:WikiProject Accessibility#LISTGAP |
→Example: Matrix multiply: Wording cleanup (could use a lot more work) |
||
Line 7:
In [[computer science]] and particularly in [[compiler]] design, '''loop nest optimization''' (LNO) is an optimization technique that applies a set of [[loop transformation]]s for the purpose of [[Locality of reference|locality]] optimization or parallelization or other loop overhead reduction of the loop nests. One classical usage is to reduce memory access latency or the cache bandwidth necessary due to cache reuse for some common linear algebra [[algorithm]]s.
==Example:
Many large mathematical operations on computers end up spending much of their time doing [[matrix multiplication]]. The operation is:
Line 72:
The code above does not use the cache very well. During the calculation of a horizontal stripe of C results, one horizontal stripe of B is loaded and the entire matrix A is loaded. For the entire calculation, C is stored once (that's good), B is loaded into the cache once (assuming a stripe of B fits in the cache with a stripe of A), but A is loaded N/ib times, where ib is the size of the strip in the C matrix, for a total of N<sup>3</sup>/ib doubleword loads from main memory. In the code above, ''ib'' is 2.
The next step to reducing the memory traffic is to make ib as large as possible. It needs to be larger than the "balance" number reported by streams. In the case of one particular 2.8 GHz Pentium 4 system used for this example, the balance number is 16.5. The second code example above cannot be extended directly, since that would require many more accumulator registers. Instead, the loop is ''blocked'' over i. (Technically, this is actually the second time i is blocked, as the first time was the factor of 2.)
<source lang="c">
|