Revision as of 07:11, 29 April 2020 edit Minorax (talk \| contribs) Edit filter helpers, Extended confirmed users, Page movers, File movers, Pending changes reviewers, Rollbackers 228,244 edits m Reverted edits by Shwe pan kone (talk) to last version by Minorax Tags: Rollback SWViewer [1.3] ← Previous edit		Revision as of 04:41, 5 June 2020 edit undo 2601:645:c282:7c60:1121:d14f:fe67:3532 (talk) →Matrix multiplication: improved explanation of hoisting Next edit →
Line 70: </syntaxhighlight> The reason for this speedup is that in the first case, the reads of <code>A[i][k]</code> are in cache (since the <code>k</code> index is the contiguous, last dimension), but <code>B[k][j]</code> is not, so there is a cache miss penalty on <code>B[k][j]</code>. <code>C[i][j]</code> is irrelevant, because it can be ~~factored~~[[Loop-invariant_code_motion\|hoisted]] out of the inner loop~~{{why?\|date=December~~ ~~2019}}. In~~-- the ~~second~~loop ~~case,~~variable ~~the~~there ~~reads and writes of~~is <code>~~C[i][j]</code> are both in cache, the reads of <code>B[~~k~~][j]~~</code> are in cache, and the read of <code>A[i][k]</code> can be factored out of the inner loop{{Explain\|date=December 2019}}. Thus, the second example has no cache miss penalty in the inner loop while the first example has a cache penalty. <syntaxhighlight lang="pascal" line="1"> for i in 0..n for j in 0..m temp = C[i][j] for k in 0..p temp = temp + A[i][k] * B[k][j]; C[i][j] = temp </syntaxhighlight> In the second case, the reads and writes of <code>C[i][j]</code> are both in cache, the reads of <code>B[k][j]</code> are in cache, and the read of <code>A[i][k]</code> can be hoisted out of the inner loop. <syntaxhighlight lang="pascal" line="1"> for i in 0..n for k in 0..p temp = A[i][k] for j in 0..m C[i][j] = C[i][j] + temp * B[k][j]; </syntaxhighlight> Thus, the second example has no cache miss penalty in the inner loop while the first example has a cache penalty. On a year 2014 processor, the second case is approximately five times faster than the first case, when written in [[C (programming language)\|C]] and compiled with <code>gcc -O3</code>. (A careful examination of the disassembled code shows that in the first case, [[GNU Compiler Collection\|GCC]] uses [[SIMD]] instructions and in the second case it does not, but the cache penalty is much worse than the SIMD gain.){{Citation needed\|date=September 2014}}

Locality of reference: Difference between revisions