Approximate computing: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: hdl added to citation with #oabot.
Citation bot (talk | contribs)
Added bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 163/1032
 
(44 intermediate revisions by 20 users not shown)
Line 1:
{{Short description|Computation of nearly accurate results}}
'''Approximate computing''' is an emerging paradigm for energy-efficient and/or high-performance design.<ref>J. Han and M. Orshansky, "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.701.4955&rep=rep1&type=pdf Approximate computing: An emerging paradigm for energy-efficient design]", in the 18th IEEE European Test Symposium, pp. 1-6, 2013.</ref> It includes a plethora of computation techniques that return a possibly inaccurate result rather than a guaranteed accurate result, and that can be used for applications where an approximate result is sufficient for its purpose.<ref name="surveyACT">{{cite journal|last1=Mittal|first1=Sparsh|title=A Survey of Techniques for Approximate Computing|journal=ACM Comput. Surv.|date=May 2016|volume=48|issue=4|pages=62:1–62:33|doi=10.1145/2893356|publisher=ACM|language=en|url=https://zenodo.org/record/1236172}}</ref><ref>A. Sampson, et al. "[http://dl.acm.org/citation.cfm?id=1993518 EnerJ: Approximate data types for safe and general low-power computation]", In ACM SIGPLAN Notices, vol. 46, no. 6, 2011.</ref> One example of such situation is for a search engine where no exact answer may exist for a certain search query and hence, many answers may be acceptable. Similarly, occasional dropping of some [[Frame (video)|frames]] in a video application can go undetected due to perceptual limitations of humans. Approximate computing is based on the observation that in many scenarios, although performing exact computation requires large amount of resources, allowing [[Approximation theory|bounded approximation]] can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy.{{clarify|date=January 2016}} For example, in [[k-means clustering|''k''-means clustering]] algorithm, allowing only 5% loss in classification accuracy can provide 50 times energy saving compared to the fully accurate classification.<ref name="surveyACT"/>
 
The key requirement in approximate computing is that approximation can be introduced only in non-critical data, since approximating critical data (e.g., control operations) can lead to disastrous consequences, such as [[program crash]] or erroneous output.
Line 7 ⟶ 8:
 
; Approximate circuits
: Approximate arithmetic circuits:<ref>Jiang et al., "[http://www.ece.ualberta.ca/~jhan8/publications/survey.pdf Approximate Arithmetic Circuits: A Survey, Characterization, and Recent Applications]", the Proceedings of the IEEE, Vol. 108, No. 12, pp. 2108 - 2135, 2020.</ref> [[Adder (electronics)|adders]],<ref>J. Echavarria, et al. "FAU: Fast and Error-Optimized Approximate Adder Units on LUT-Based FPGAs", FPT, 2016.</ref><ref>J. Miao, et al. "[https://repositories.lib.utexas.edu/bitstream/handle/2152/19706/miao_thesis_201291.pdf?sequence=1 Modeling and synthesis of quality-energy optimal approximate adders]", ICCAD, 2012</ref> [[Binary multiplier|multipliers]]<ref>{{Cite book|lastlast1=Rehman|firstfirst1=Semeen|last2=El-Harouni|first2=Walaa|last3=Shafique|first3=Muhammad|last4=Kumar|first4=Akash|last5=Henkel|first5=Jörg|date=2016-11-07|title=Architectural-space exploration of approximate multipliers|publisher=ACM|pages=80|doi=10.1145/2966986.2967005|isbn=9781450344661|s2cid=5326133}}</ref> and other [[logical circuit]]s can reduce hardware overhead.<ref>S. Venkataramani, et al. "[http://algos.inesc-id.pt/projectos/approx/FCT/Ref-15.pdf SALSA: systematic logic synthesis of approximate circuits]", DAC, 2012.</ref><ref>J. Miao, et al. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.453.7870&rep=rep1&type=pdf Approximate logic synthesis under general error magnitude and frequency constraints]", ICCAD, 2013</ref><ref>R. Hegde et al. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.8929&rep=rep1&type=pdf Energy-efficient signal processing via algorithmic noise-tolerance]", ISLPED, 1999.</ref> For example, an approximate multi-bit adder can ignore the [[carry chain]] and thus, allow all its sub-adders to perform addition operation in parallel.<ref>{{Cite journal |last1=Camus |first1=Vincent |last2=Mei |first2=Linyan |last3=Enz |first3=Christian |last4=Verhelst |first4=Marian |date=December 2019 |title=Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing |journal=IEEE Journal on Emerging and Selected Topics in Circuits and Systems |volume=9 |issue=4 |pages=697–711 |doi=10.1109/JETCAS.2019.2950386 |issn=2156-3357 |quote="The implementation chosen in this study assumes a rightshifting sequential multiplier as it requires a smaller firststage adder than a left-shifting design, preventing long carry propagation and sign-bit extension."|doi-access=free |bibcode=2019IJEST...9..697C }}</ref><ref>{{Cite journal |last1=Nagornov |first1=Nikolay N. |last2=Lyakhov |first2=Pavel A. |last3=Bergerman |first3=Maxim V. |last4=Kalita |first4=Diana I. |date=2024 |title=Modern Trends in Improving the Technical Characteristics of Devices and Systems for Digital Image Processing |journal=IEEE Access |volume=12 |pages=44659–44681 |doi=10.1109/ACCESS.2024.3381493 |issn=2169-3536 |quote="Addition and accumulation of high order bits are not performed until the partial product reduction for the next multiplication in the proposed architecture."|doi-access=free |bibcode=2024IEEEA..1244659N }}</ref>
; Approximate storage and memory
: Instead of [[Computer data storage|storing data]] values exactly, they can be stored approximately, e.g., by [[Data truncation|truncating]] the lower-bits in [[floating point]] data. Another method is to accept less reliable memory. For this, in [[DRAM]]<ref>{{Cite journal|lastlast1=Raha|firstfirst1=A.|last2=Sutar|first2=S.|last3=Jayakumar|first3=H.|last4=Raghunathan|first4=V.|date=July 2017|title=Quality Configurable Approximate DRAM|journal=IEEE Transactions on Computers|volume=66|issue=7|pages=1172–1187|doi=10.1109/TC.2016.2640296|bibcode=2017ITCmp..66.1172R |issn=0018-9340|doi-access=free}}</ref> and [[eDRAM]], [[refresh rate]] assignments can be lowered or controlled.<ref>{{Cite journalbook|lastlast1=Kim|firstfirst1=Yongjune|last2=Choi|first2=Won Ho|last3=Guyot|first3=Cyril|last4=Cassuto|first4=Yuval|datetitle=December2019 IEEE Global Communications Conference (GLOBECOM) 2019|titlechapter=On the Optimal Refresh Power Allocation for Energy-Efficient Memories |url=https://ieeexplore.ieee.org/document/9013465/|journaldate=December 2019 IEEE Global Communications Conference (GLOBECOM)|___location=Waikoloa, HI, USA|publisher=IEEE|pages=1–6|doi=10.1109/GLOBECOM38437.2019.9013465|isbn=978-1-7281-0962-6|arxiv=1907.01112|s2cid=195776538}}</ref> In [[Static random-access memory|SRAM]], supply voltage can be lowered<ref>{{Cite journal|lastlast1=Frustaci|firstfirst1=Fabio|last2=Blaauw|first2=David|author2-link=David Blaauw|last3=Sylvester|first3=Dennis|last4=Alioto|first4=Massimo|date=June 2016|title=Approximate SRAMs With Dynamic Energy-Quality Management|url=https://ieeexplore.ieee.org/document/7372479/|journal=IEEE Transactions on Very Large Scale Integration (VLSI) Systems|volume=24|issue=6|pages=2128–2141|doi=10.1109/TVLSI.2015.2503733|bibcode=2016ITVL...24.2128F |s2cid=8051173|issn=1063-8210}}</ref> or controlled.<ref>{{Cite journal|lastlast1=Kim|firstfirst1=Yongjune|last2=Kang|first2=Mingu|last3=Varshney|first3=Lav R.|last4=Shanbhag|first4=Naresh R.|date=2018|title=Generalized Water-filling for Source-aware Energy-efficient SRAMs|url=https://ieeexplore.ieee.org/document/8368137/|journal=IEEE Transactions on Communications|volume=66|issue=10|pages=1–14826–4841|doi=10.1109/TCOMM.2018.2841406|issn=0090-6778|arxiv=1710.07153|bibcode=2018ITCom..66.4826K |s2cid=24512949}}</ref> Approximate storage can be applied to reduce [[Magnetoresistive random-access memory|MRAM]]'s high write energy consumption.<ref>{{Cite journal |lastlast1=Kim |firstfirst1=Yongjune |last2=Jeon |first2=Yoocharn |last3=GuyotChoi |first3=CyrilHyeokjin |last4=CassutoGuyot |first4=Cyril |last5=Cassuto |first5=Yuval |date=June2022 2020|title=Optimizing the Write Fidelity of MRAMs|url=https://ieeexplore.ieee.org/document/9173990/ by Alternating Water-filling Algorithm |journal=2020 IEEE International SymposiumTransactions on InformationCommunications Theory (ISIT)|___locationvolume=Los70 Angeles, CA, USA|publisherissue=IEEE9 |pages=792–7975825–5836 |doi=10.1109/ISIT44484TCOMM.20202022.91739903190868 |isbnbibcode=978-1-7281-6432-82022ITCom..70.5825K |arxivs2cid=2001.03803250565077 |issn=0090-6778}}</ref> In general, any [[error detection and correction]] mechanisms should be disabled.
; Software-level approximation
: There are several ways to approximate at software level. [[Memoization]] or fuzzy memoization (the use of a [[vector database]] for approximate retrieval from a cache, ''i.e.'' fuzzy caching) can be applied. Some [[iteration]]s of [[Loop (computing)|loops]] can be skipped (termed as [[loop perforation]]) to achieve a result faster. Some tasks can also be skipped, for example when a run-time condition suggests that those tasks are not going to be useful ([[task skipping]]). [[Monte Carlo algorithm]]s and [[Randomized algorithm]]s trade correctness for execution time guarantees.<ref>C.Alippi, Intelligence for Embedded Systems: a Methodological approach, Springer, 2014, pp. 283</ref> The computation can be reformulated according to paradigms that allow easily the acceleration on specialized hardware, e.g. a neural processing unit.<ref>{{Cite conference|last1=Esmaeilzadeh|first1=Hadi|last2=Sampson|first2=Adrian|last3=Ceze|first3=Luis|last4=Burger|first4=Doug|title=Neural acceleration for general-purpose approximate programs|conference=45th Annual IEEE/ACM International Symposium on Microarchitecture|year=2012|doi=10.1109/MICRO.2012.48|publisher=IEEE|pages=449–460|___location=Vancouver, BC}}</ref>
; Approximate system
: In an approximate system,<ref>{{Cite journalbook|lastlast1=Raha|firstfirst1=Arnab|last2=Raghunathan|first2=Vijay|date=2017|title=Towards Full-System Energy-Accuracy Tradeoffs: A Case Study of An Approximate Smart Camera System|journal=Proceedings of the 54th Annual Design Automation Conference 2017 |chapter=Towards Full-System Energy-Accuracy Tradeoffs |date=2017|series=DAC '17|___location=New York, NY, USA|publisher=ACM|pages=74:1–74:6|doi=10.1145/3061639.3062333|isbn=9781450349277|s2cid=2503638}}</ref><ref>{{Cite journal |last1=Ghosh |first1=Soumendu Kumar |last2=Raha |first2=Arnab |last3=Raghunathan |first3=Vijay |date=2023-07-24 |title=Energy-Efficient Approximate Edge Inference Systems |url=https://dl.acm.org/doi/10.1145/3589766 |journal=ACM Transactions on Embedded Computing Systems |volume=22 |issue=4 |pages=77:1–77:50 |doi=10.1145/3589766 |issn=1539-9087|url-access=subscription }}</ref> different subsystems of the system such as the processor, memory, sensor, and communication modules are synergistically approximated to obtain a much better system-level Q-E trade-off curve compared to individual approximations to each of the subsystems.
 
==Application areas==
Approximate computing has been used in a variety of domains where the applications are error-tolerant, such as [[multimedia]] processing, [[machine learning]], [[signal processing]], [[Computational science|scientific computing]]. Therefore, etc.approximate Googlecomputing is usingmostly thisdriven approachby inapplications theirthat [[Tensorare processingrelated unit]]sto (TPUhuman perception/cognition and have inherent error resilience. Many of these applications are based on statistical or probabilistic computation, asuch customas ASIC)different approximations can be made to better suit the desired objectives.<ref>{{cite journal |last1=Liu |first1=Weiqiang |last2=Lombardi |first2=Fabrizio |last3=Schulte |first3=Michael |title=Approximate Computing: From Circuits to Applications |journal=Proceedings of the IEEE |date=Dec 2020 |volume=108 |issue=12 |page=2103 |doi=10.1109/JPROC.2020.3033361 |bibcode=2020IEEEP.108.2103L |doi-access=free }}</ref>
One notable application in [[machine learning]] is that Google is using this approach in their [[Tensor processing unit]]s (TPU, a custom [[application-specific integrated circuit|ASIC]]).<ref>{{cite journal |last1=Liu |first1=Weiqiang |last2=Lombardi |first2=Fabrizio |last3=Schulte |first3=Michael |title=Approximate Computing: From Circuits to Applications |journal=Proceedings of the IEEE |date=Dec 2020 |volume=108 |issue=12 |page=2104 |doi=10.1109/JPROC.2020.3033361 |bibcode=2020IEEEP.108.2103L |doi-access=free }}</ref>
 
==Derived paradigms==
The main issue in approximate computing is the identification of the section of the application that can be approximated. In the case of large scale applications, it is very common to find people holding the expertise on approximate computing techniques not having enough expertise on the application ___domain (and vice versa). In order to solve this problem, [[programming paradigm]]s<ref>{{cite journalbook|last1=Nguyen|first1=Donald|last2=Lenharth|first2=Andrew|last3=Pingali|first3=Keshav|title=A lightweight infrastructure for graph analytics|journal=Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles|date=2013|pages=456–471|publisher=ACM|language=en|url=https://dl.acm.org/ft_gateway.cfm?id=2522739|doi=10.1145/2517349.2522739|isbn=9781450323888|doi-access=free}}</ref><ref>{{cite journal|last1chapter=Silvano|first1=Cristina|last2=Agosta|first2=Giovanni|last3=Cherubin|first3=Stefano|last4=Gadioli|first4=Davide|last5=Palermo|first5=Gianluca|last6=Bartolini|first6=Andrea|last7=Benini|first7=Luca|last8=Martinovič|first8=Jan|last9=Palkovič|first9=Martin|last10=Slaninová|first10=Kateřina|last11=Bispo|first11=João|last12=Cardoso|first12=JoãoA M.lightweight P.|last13=Rui|first13=Abreu|last14=Pinto|first14=Pedro|last15=Cavazzoni|first15=Carlo|last16=Sanna|first16=Nico|last17=Beccari|first17=Andrea R.|last18=Cmar|first18=Radim|last19=Rohou|first19=Erven|title=The ANTAREX approach to autotuning and adaptivityinfrastructure for energygraph efficientanalytics HPC systems|journal=Proceedings of the ACM International Conference on Computing Frontiers|date=20162013|pages=288–293456–471|publisher=ACM|language=en|url=https://repositorio.inesctec.pt/bitstream/123456789/6389/1/P-00K-H7T.pdf|doi=10.1145/29031502517349.2903470|hdl=11585/5882562522739|isbn=97814503412889781450323888|hdldoi-access=free}}</ref> have been proposed. They all have in common the clear role separation between application [[programmer]] and application [[___domain expert]]. These approaches allow the spread of the most common [[Program optimization|optimizations]] and approximate computing techniques.
 
==See also==