Content deleted Content added
Citation bot (talk | contribs) Alter: journal, template type. Add: eprint, class, s2cid, authors 1-1. Removed proxy/dead URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
m Open access bot: url-access updated in citation with #oabot. |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 14:
=== Matrix multiplication ===
<ref>{{Cite
{{Math theorem
Line 24:
}}
More general results for other numerical linear algebra operations can be found in.<ref>{{Cite journal |last1=Ballard |first1=G. |last2=Carson |first2=E. |last3=Demmel |first3=J. |last4=Hoemmen |first4=M. |last5=Knight |first5=N. |last6=Schwartz |first6=O. |date=May 2014 |title=Communication lower bounds and optimal algorithms for numerical linear algebra |url=http://dx.doi.org/10.1017/s0962492914000038 |journal=Acta Numerica |volume=23 |pages=1–155 |doi=10.1017/s0962492914000038 |s2cid=122513943 |issn=0962-4929|url-access=subscription }}</ref> The following proof is from.<ref>{{Cite arXiv |last1=Demmel |first1=James |last2=Dinh |first2=Grace |date=2018-04-24 |title=Communication-Optimal Convolutional Neural Nets |class=cs.DS |eprint=1802.06905 }}</ref>
{{Math proof|title=Proof|proof=
Line 53:
* Measure of computation = Time per [[FLOP]] = γ
* Measure of communication = No. of words of data moved = β
⇒ Total running time = γ
From the fact that ''β'' >> ''γ'' as measured in time and energy, communication cost dominates computation cost. Technological trends<ref name="DARPA_2008"/> indicate that the relative cost of communication is increasing on a variety of platforms, from [[cloud computing]] to [[supercomputers]] to mobile devices. The report also predicts that gap between [[DRAM]] access time and FLOPs will increase 100× over coming decade to balance power usage between processors and DRAM.<ref name="Demmel_2012"/>
Line 91:
for i = 1 to n
{read row i of A into fast memory} - n
for j = 1 to n
{read C(i,j) into fast memory} - n
{read column j of B into fast memory} - n
for k = 1 to n
C(i,j) = C(i,j) + A(i,k) * B(k,j)
{write C(i,j) back to slow memory} - n
Fast memory may be defined as the local processor memory ([[CPU cache]]) of size M and slow memory may be defined as the DRAM.
Line 103:
Communication cost (reads/writes): ''n''<sup>3</sup> + 3''n''<sup>2</sup> or O(''n''<sup>3</sup>)
Since total running time = ''γ''
==== Blocked (tiled) matrix multiplication ====
Line 111:
for i = 1 to n/b
for j = 1 to n/b
{read block C(i,j) into fast memory} - b
for k = 1 to n/b
{read block A(i,k) into fast memory} - b
{read block B(k,j) into fast memory} - b
C(i,j) = C(i,j) + A(i,k) * B(k,j) - {do a matrix multiply on blocks}
{write block C(i,j) back to slow memory} - b
Communication cost: 2''n''<sup>3</sup>/''b'' + 2''n''<sup>2</sup> reads/writes << 2''n''<sup>3</sup> arithmetic cost
Line 137:
<ref name="DARPA_2008">Bergman, Keren, et al. "[http://staff.kfupm.edu.sa/ics/ahkhan/Resources/Articles/ExaScale%20Computing/TR-2008-13.pdf Exascale computing study: Technology challenges in exascale computing systems]." [[Defense Advanced Research Projects Agency]] [[Information Processing Techniques Office]] (DARPA IPTO), Tech. Rep 15 (2008).</ref>
<ref name="Shalf_2010">Shalf, John, Sudip Dosanjh, and John Morrison. "Exascale computing technology challenges". High Performance Computing for Computational Science–VECPAR 2010. Springer Berlin Heidelberg, 2011. 1–25.</ref>
<ref name="Frigo_1999">M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran,
<ref name="Toledo_1997">S. Toledo,
<ref name="Gustavson_1997">F. Gustavson,
<ref name="Elmroth_2004">E. Elmroth, F. Gustavson, I. Jonsson, and B. Kagstrom,
<ref name="Grigori_2014">[[Laura Grigori|Grigori, Laura]]. "[http://www.lifl.fr/jncf2014/files/lecture-notes/grigori.pdf Introduction to communication avoiding linear algebra algorithms in high performance computing].</ref>
}}
|