Revision as of 21:09, 29 November 2015 edit Qwertyus (talk \| contribs) Extended confirmed users 31,640 edits →Iterative algorithm: c/e ← Previous edit		Revision as of 21:10, 29 November 2015 edit undo Qwertyus (talk \| contribs) Extended confirmed users 31,640 edits m →Cache behavior: {{sfrac}} Next edit →
Line 31: which order is best also depends on whether the matrices are stored in [[row-major order]], column-major order, or a mix of both. In particular, in the idealized case of a [[CPU cache#Associativity\|fully associative cache]] consisting of {{mvar\|M}} cache lines of {{mvar\|b}} bytes each, the above algorithm is sub-optimal for {{mvar\|A}} and {{mvar\|B}} stored in row-major order. When {{math\|''n'' > {{sfrac\|''M''∕\|''b''}}}}, every iteration of the inner loop (a simultaneous sweep through a row of {{mvar\|A}} and a column of {{mvar\|B}}) incurs a cache miss when accessing an element of {{mvar\|B}}. This means that the algorithm incurs {{math\|Θ(''n''<sup>3</sup>)}} cache misses in the worst case. {{As of\|2010}}, the speed of memories compared to that of processors is such that the cache misses, rather than the actual calculations, dominate the running time for sizable matrices.<ref name="ocw">{{cite web \|first1=Saman \|last1=Amarasinghe \|first2=Charles \|last2=Leiserson \|title=6.172 Performance Engineering of Software Systems, Lecture 8 \|year=2010 \|publisher=Massachusetts Institute of Technology \|website=MIT OpenCourseWare \|url=http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/video-lectures/lecture-8-cache-efficient-algorithms/ \|accessdate=27 January 2015}}</ref> The optimal variant of the iterative algorithm for {{mvar\|A}} and {{mvar\|B}} in row-major layout is a ''[[loop tiling\|tiled]]'' version, where the matrix is implicitly divided into square tiles of size {{math\|√''M''}} by {{math\|√''M''}}:<ref name="ocw"/><ref>{{cite conference \|first1=Monica S. \|last1=Lam \|first2=Edward E. \|last2=Rothberg \|first3=Michael E. \|last3=Wolf \|title=The Cache Performance and Optimizations of Blocked Algorithms \|conference=Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) \|year=1991}}</ref>

Matrix multiplication algorithm: Difference between revisions