Matrix multiplication algorithm: Difference between revisions

Content deleted Content added
m Cache behavior: {{sfrac}}
Line 31:
which order is best also depends on whether the matrices are stored in [[row-major order]], column-major order, or a mix of both.
 
In particular, in the idealized case of a [[CPU cache#Associativity|fully associative cache]] consisting of {{mvar|M}} cache lines of {{mvar|b}} bytes each, the above algorithm is sub-optimal for {{mvar|A}} and {{mvar|B}} stored in row-major order. When {{math|''n'' > {{sfrac|''M''|''b''}}}}, every iteration of the inner loop (a simultaneous sweep through a row of {{mvar|A}} and a column of {{mvar|B}}) incurs a cache miss when accessing an element of {{mvar|B}}. This means that the algorithm incurs {{math|Θ(''n''<sup>3</sup>)}} cache misses in the worst case. {{As of|2010}}, the speed of memories compared to that of processors is such that the cache misses, rather than the actual calculations, dominate the running time for sizable matrices.<ref name="ocw">{{cite web |first1=Saman |last1=Amarasinghe |first2=Charles |last2=Leiserson |title=6.172 Performance Engineering of Software Systems, Lecture 8 |year=2010 |publisher=Massachusetts Institute of Technology |website=MIT OpenCourseWare |url=http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/lecture-8-cache-efficient-algorithms/ |accessdate=27 January 2015}}</ref>
 
The optimal variant of the iterative algorithm for {{mvar|A}} and {{mvar|B}} in row-major layout is a ''[[loop tiling|tiled]]'' version, where the matrix is implicitly divided into square tiles of size {{math|√''M''}} by {{math|√''M''}}:<ref name="ocw"/><ref>{{cite conference |first1=Monica S. |last1=Lam |first2=Edward E. |last2=Rothberg |first3=Michael E. |last3=Wolf |title=The Cache Performance and Optimizations of Blocked Algorithms |conference=Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) |year=1991}}</ref>