Content deleted Content added
m fix |
Added short description Tags: Mobile edit Mobile app edit Android app edit App description add |
||
(21 intermediate revisions by 11 users not shown) | |||
Line 1:
{{Short description|Computer architecture designed for a specific task}}
A '''___domain-specific architecture (DSA)''' is a programmable [[computer architecture]] specifically tailored to operate very efficiently within the confines of a given application ___domain. The term is often used in contrast to general-purpose architectures, such as [[CPU]]
== History ==
In conjunction with the [[Semiconductor device|semiconductor]] boom that started in the 1960s, computer architects were tasked with finding new ways to exploit the increasingly large number of transistors available. [[Moore's law|Moore's Law]] and [[Dennard scaling|Dennard Scaling]] enabled architects to focus on improving the performance of general-purpose [[Microprocessor|microprocessors]] on general-purpose programs.<ref>{{Cite journal |last=Moore |first=G.E. |date=January 1998 |title=Cramming More Components Onto Integrated Circuits |url=http://dx.doi.org/10.1109/jproc.1998.658762 |journal=Proceedings of the IEEE |volume=86 |issue=1 |pages=82–85 |doi=10.1109/jproc.1998.658762 |issn=0018-9219|url-access=subscription }}</ref><ref>{{Cite journal |
These efforts yielded several technological innovations, such as [[Multi-level cache|multi-level caches]], [[out-of-order execution]], deep instruction [[Instruction pipelining|pipelines]], [[Multithreading (computer architecture)|multithreading]], and [[multiprocessing]]. The impact of these innovations was measured on generalist [[Benchmarks in computation|benchmarks]] such as [[SPEC]], and architects were not concerned with the internal structure or specific characteristics of these programs.<ref name=":0">{{Cite book |
The end of Dennard Scaling pushed computer architects to switch from a single, very fast processor to several [[Multi-core processor|processor cores]]. Performance improvement could no longer be achieved by simply increasing the operating frequency of a single core.<ref>{{Cite web |last=Schauer |first=Bryan |title=Multicore Processors – A Necessity |url=http://www.csa.com/discoveryguides/multicore/review.pdf |archive-url=https://web.archive.org/web/20111125035151/http://www.csa.com/discoveryguides/multicore/review.pdf |archive-date=2011-11-25 |access-date=2023-07-06 |website=}}</ref>
The end of Moore's Law shifted the focus away from general-purpose architectures towards more specialized hardware. Although general-purpose CPU will likely have a place in any computer system, [[Heterogeneous System Architecture|heterogeneous systems]] composed of general-purpose and ___domain-specific components are the most recent trend for achieving high performance.
While [[Hardware acceleration|hardware accelerators]] and [[Application-specific integrated circuit|ASIC]] have been used in very specialized application domains since the inception of the semiconductor industry, they generally implement a specific function with very limited flexibility. In contrast, the shift towards ___domain-specific architectures wants to achieve a better balance of flexibility and specialization.<ref>{{Cite book |last=Barr |first=Keith Elliott |title=ASIC design in the silicon sandbox: a complete guide to building mixed-signal integrated circuits |date=2007 |publisher=McGraw-Hill |isbn=978-0-07-148161-8 |___location=New York}}</ref>
Line 17 ⟶ 18:
== Guidelines for DSA design ==
[[John L. Hennessy|John Hennessy]] and [[David Patterson (computer scientist)|David Patterson]] outlined five principles for DSA design that lead to better area efficiency and energy savings. The objective in these types of architecture is often also to reduce the Non-Recurring Engineering (NRE) costs so that the investment in a specialized solution can be more easily amortized.<ref name=":0" />
▲Moving data in general-purpose [[Memory hierarchy|memory hierarchies]] requires a remarkable amount of energy in order to attempt to minimize the latency to access data. In the case of Domain-Specific Architectures, it is expected that understanding of the application domains by hardware and [[compiler]] designers allows for simpler and specialized memory hierarchies, where the data movement is largely handled in software, with tailor-made memories for specific functions within the ___domain.<ref name=":0" />
▲Since a remarkable amount of hardware resources can be saved by dropping general-purpose architectural optimizations such as out-of-order execution, [[Prefetching (computing)|prefetching]], address [[Coalescing (computer science)|coalescing]], and hardware speculation, the resources saved should be re-invested to maximally exploit the available [[Parallelism (computing)|parallelism]], for example, by adding more arithmetic units, or solve any [[memory bandwidth]] issues by adding bigger memories.<ref name=":0" />
▲Since the target application domains almost always present an inherent form of parallelism, it is important to decide how to take advantage of this parallelism and expose it to the software. If, for example, a [[Simd|SIMD]] architecture can work in the ___domain, it would be easier for the programmer to use than a [[MIMD]] architecture.<ref name=":0" />
▲Whenever possible, using narrower and simpler [[Data type|data types]] yields several advantages. For example, it reduces the cost of moving data for [[Memory-bound function|memory-bound]] applications, and it can also reduce the amount of resources required to implement the respective arithmetic units.<ref name=":0" />
{{See also|Domain-specific language}}
▲One of the challenges for DSAs is ease of use, and more specifically, being able to effectively program the architecture and run applications on it. Whenever possible, it is advised to use existing [[Domain-specific language|Domain-Specific Language]]<nowiki/>s (DSL) such as [[Halide (programming language)|Halide]]<ref>{{Cite web |last=Ragan-Kelley |first=Jonathan |title=Halide |url=https://halide-lang.org/ |access-date=2023-07-06 |website=halide-lang.org |language=en}}</ref> and [[TensorFlow]]<ref>{{Cite web |title=TensorFlow |url=https://www.tensorflow.org/ |access-date=2023-07-06 |website=TensorFlow |language=en}}</ref> to more easily program a DSA. Re-use of existing compiler toolchains and software frameworks makes using a new DSA significantly more accessible.<ref name=":0" />
== DSA for deep neural networks ==
One of the application domains where
{{Infobox CPU architecture
Line 63 ⟶ 53:
=== TPU ===
{{See also|Tensor Processing Unit}}
Google's
The TPU was designed to be a [[Coprocessor|co-processor]] communicating via a [[PCI Express|PCIe]] bus, to be easily incorporated in existing servers. It is primarily a [[Matrix multiplication|matrix-multiplication]] engine following a CISC (Complex Instruction Set Computer) [[Instruction set architecture|ISA]]. The multiplication engine uses [[Systolic array|systolic execution]] to save energy, reducing the number of writes to [[Volatile memory|SRAM]].<ref name=":2">{{Cite book |
The TPU was fabricated with a 28-nm process
The TPU computes
=== Microsoft Catapult ===
[[Microsoft]]'s Project Catapult<ref>{{Cite web |title=Project Catapult |url=https://www.microsoft.com/en-us/research/project/project-catapult/ |access-date=2023-07-06 |website=Microsoft Research |language=en-US}}</ref> put an [[Field-programmable gate array|FPGA]] connected through a PCIe bus into data center servers, with the idea of using the FPGA to accelerate various applications running on the server, leveraging the reconfiguration capabilities of FPGA to accelerate many different applications.
Differently from Google's TPU, the Catapult FPGA needed to be programmed via [[Hardware description language|hardware-description
Microsoft designed a [[Convolutional neural network|CNN]] accelerator for the Catapult framework that was
=== NVDLA ===
Line 86 ⟶ 76:
=== Pixel Visual Core ===
{{See also|Pixel Visual Core}}
The Pixel Visual Core (PVC) is an of [[ARM architecture|ARM-based]] [[Image processor|image processors]] designed by [[Google]]. The PVC is a fully programmable [[Image processor|image]], [[Vision processing unit|vision]] and [[AI accelerator|AI]] multi-core ___domain-specific architecture (DSA) for mobile devices and in future for [[Internet of things|IoT]]. It first appeared in the [[Google Pixel 2|Google Pixel 2 and 2 XL]] which were introduced on October 19, 2017. It has also appeared in the [[Google Pixel 3|Google Pixel 3 and 3 XL]]. Starting with the [[Pixel 4]], this chip was replaced with the [[Pixel Neural Core]].<ref>{{Cite web |last=Cutress |first=Ian |title=Hot Chips 2018: The Google Pixel Visual Core Live Blog (10am PT, 5pm UTC) |url=https://www.anandtech.com/show/13241/hot-chips-2018-the-google-pixel-visual-core-live-blog |archive-url=https://web.archive.org/web/20180820204207/https://www.anandtech.com/show/13241/hot-chips-2018-the-google-pixel-visual-core-live-blog |url-status=dead |archive-date=August 20, 2018 |access-date=2023-07-07 |website=www.anandtech.com}}</ref>
=== Anton3 ===
[[File:Anton3 CoreTiles and EdgeTile.svg|thumb|upright=1.5|The architecture of the Anton3 specialized cores. Geometry Cores carry out general-purpose computation while specialized hardware accelerate force-fields computation.]]
Anton3 is a
== References ==
<references />
== Further
* Computer Architecture. A Quantitative Approach. Sixth Edition. John L. Hennessy. Stanford University. David A. Patterson. University of California, Berkeley.
== See also ==
* [[Hardware acceleration|Hardware Accelerator]]
* [[AI accelerator|AI Accelerator]]
* [[Application-specific integrated circuit|ASIC]]
* [[Field-programmable gate array|FPGA]]
[[Category:Computer architecture]]
|