Content deleted Content added
→Iterative algorithm: c/e |
m →Cache behavior: {{sfrac}} |
||
Line 31:
which order is best also depends on whether the matrices are stored in [[row-major order]], column-major order, or a mix of both.
In particular, in the idealized case of a [[CPU cache#Associativity|fully associative cache]] consisting of {{mvar|M}} cache lines of {{mvar|b}} bytes each, the above algorithm is sub-optimal for {{mvar|A}} and {{mvar|B}} stored in row-major order. When {{math|''n'' > {{sfrac|''M''
The optimal variant of the iterative algorithm for {{mvar|A}} and {{mvar|B}} in row-major layout is a ''[[loop tiling|tiled]]'' version, where the matrix is implicitly divided into square tiles of size {{math|√''M''}} by {{math|√''M''}}:<ref name="ocw"/><ref>{{cite conference |first1=Monica S. |last1=Lam |first2=Edward E. |last2=Rothberg |first3=Michael E. |last3=Wolf |title=The Cache Performance and Optimizations of Blocked Algorithms |conference=Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) |year=1991}}</ref>
|