Loop nest optimization: Difference between revisions

Content deleted Content added
m Simplify link
Line 32:
There are three problems to solve:
 
* Floating point additions take some number of cycles to complete. In order to keep an [[adder (electronics)|adder]] with multiple cycle latency busy, the code must update multiple [[Accumulator_Accumulator (computing)|accumulators]] in parallel.
 
* Machines can typically do just one memory operation per [[multiply–add]], so values loaded must be reused at least twice.