Communication-avoiding algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:58, 8 April 2023 edit Citation bot (talk \| contribs) Bots 5,862,408 edits Alter: journal, template type. Add: eprint, class, s2cid, authors 1-1. Removed proxy/dead URL that duplicated identifier. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar ← Previous edit		Latest revision as of 00:20, 20 June 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot.
(2 intermediate revisions by 2 users not shown)
Line 14: === Matrix multiplication === <ref>{{Cite ~~journal~~book \|last1=Jia-Wei \|first1=Hong \|last2=Kung \|first2=H. T. \|~~date~~title=~~1981~~Proceedings of the thirteenth annual ACM symposium on Theory of computing - STOC '81 \|~~title~~chapter=I/O complexity: ~~The~~\|date=1981 ~~red-blue pebble game~~\|pages=326–333 \|chapter-url=http://dx.doi.org/10.1145/800076.802486 ~~\|journal=Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing - STOC '81~~ \|___location=New York, New York, USA \|publisher=ACM Press \|doi=10.1145/800076.802486\|s2cid=8410593 }}</ref> Corollary 6.2: {{Math theorem Line 24: }} More general results for other numerical linear algebra operations can be found in.<ref>{{Cite journal \|last1=Ballard \|first1=G. \|last2=Carson \|first2=E. \|last3=Demmel \|first3=J. \|last4=Hoemmen \|first4=M. \|last5=Knight \|first5=N. \|last6=Schwartz \|first6=O. \|date=May 2014 \|title=Communication lower bounds and optimal algorithms for numerical linear algebra \|url=http://dx.doi.org/10.1017/s0962492914000038 \|journal=Acta Numerica \|volume=23 \|pages=1–155 \|doi=10.1017/s0962492914000038 \|s2cid=122513943 \|issn=0962-4929\|url-access=subscription }}</ref> The following proof is from.<ref>{{Cite arXiv \|last1=Demmel \|first1=James \|last2=Dinh \|first2=Grace \|date=2018-04-24 \|title=Communication-Optimal Convolutional Neural Nets \|class=cs.DS \|eprint=1802.06905 }}</ref> {{Math proof\|title=Proof\|proof= Line 53: * Measure of computation = Time per [[FLOP]] = γ * Measure of communication = No. of words of data moved = β ⇒ Total running time = γ~~·~~·(no. of [[FLOP]]s) + β~~·~~·(no. of words) From the fact that ''β'' >> ''γ'' as measured in time and energy, communication cost dominates computation cost. Technological trends<ref name="DARPA_2008"/> indicate that the relative cost of communication is increasing on a variety of platforms, from [[cloud computing]] to [[supercomputers]] to mobile devices. The report also predicts that gap between [[DRAM]] access time and FLOPs will increase 100× over coming decade to balance power usage between processors and DRAM.<ref name="Demmel_2012"/> Line 91: for i = 1 to n {read row i of A into fast memory} - n²<sup>2</sup> reads for j = 1 to n {read C(i,j) into fast memory} - n²<sup>2</sup> reads {read column j of B into fast memory} - n³<sup>3</sup> reads for k = 1 to n C(i,j) = C(i,j) + A(i,k) * B(k,j) {write C(i,j) back to slow memory} - n²<sup>2</sup> writes Fast memory may be defined as the local processor memory ([[CPU cache]]) of size M and slow memory may be defined as the DRAM. Line 103: Communication cost (reads/writes): ''n''<sup>3</sup> + 3''n''<sup>2</sup> or O(''n''<sup>3</sup>) Since total running time = ''γ''~~·~~·O(''n''<sup>3</sup>) + ''β''~~·~~·O(''n''<sup>3</sup>) and ''β'' >> ''γ'' the communication cost is dominant. The blocked (tiled) matrix multiplication algorithm<ref name="Demmel_2012"/> reduces this dominant term: ==== Blocked (tiled) matrix multiplication ==== Line 111: for i = 1 to n/b for j = 1 to n/b {read block C(i,j) into fast memory} - b²<sup>2</sup> × (n/b)²<sup>2</sup> = n²<sup>2</sup> reads for k = 1 to n/b {read block A(i,k) into fast memory} - b²<sup>2</sup> × (n/b)³<sup>3</sup> = n³<sup>3</sup>/b reads {read block B(k,j) into fast memory} - b²<sup>2</sup> × (n/b)³<sup>3</sup> = n³<sup>3</sup>/b reads C(i,j) = C(i,j) + A(i,k) * B(k,j) - {do a matrix multiply on blocks} {write block C(i,j) back to slow memory} - b²<sup>2</sup> × (n/b)²<sup>2</sup> = n²<sup>2</sup> writes Communication cost: 2''n''<sup>3</sup>/''b'' + 2''n''<sup>2</sup> reads/writes << 2''n''<sup>3</sup> arithmetic cost Line 137: <ref name="DARPA_2008">Bergman, Keren, et al. "[http://staff.kfupm.edu.sa/ics/ahkhan/Resources/Articles/ExaScale%20Computing/TR-2008-13.pdf Exascale computing study: Technology challenges in exascale computing systems]." [[Defense Advanced Research Projects Agency]] [[Information Processing Techniques Office]] (DARPA IPTO), Tech. Rep 15 (2008).</ref> <ref name="Shalf_2010">Shalf, John, Sudip Dosanjh, and John Morrison. "Exascale computing technology challenges". High Performance Computing for Computational Science–VECPAR 2010. Springer Berlin Heidelberg, 2011. 1–25.</ref> <ref name="Frigo_1999">M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran, ~~“Cacheoblivious~~"Cacheoblivious ~~algorithms”~~algorithms", In FOCS ~~’99~~'99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, 1999. IEEE Computer Society.</ref> <ref name="Toledo_1997">S. Toledo, “"[https://web.archive.org/web/20180102191420/https://pdfs.semanticscholar.org/d198/43912d46f6a25de815eadb1fb43d5ca6f61c.pdf Locality of reference in LU Decomposition with partial pivoting],”" SIAM J. Matrix Anal. Appl., vol. 18, no. 4, 1997.</ref> <ref name="Gustavson_1997">F. Gustavson, ~~“Recursion~~"Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms,”" IBM Journal of Research and Development, vol. 41, no. 6, pp. 737–755, 1997.</ref> <ref name="Elmroth_2004">E. Elmroth, F. Gustavson, I. Jonsson, and B. Kagstrom, “"[http://www.csc.kth.se/utbildning/kth/kurser/2D1253/matalg06/SIR000003.pdf Recursive blocked algorithms and hybrid data structures for dense matrix library software],”" SIAM Review, vol. 46, no. 1, pp. 3–45, 2004.</ref> <ref name="Grigori_2014">[[Laura Grigori\|Grigori, Laura]]. "[http://www.lifl.fr/jncf2014/files/lecture-notes/grigori.pdf Introduction to communication avoiding linear algebra algorithms in high performance computing].</ref> }}