Floating-point unit: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:18, 16 November 2023 edit Gah4 (talk \| contribs) Extended confirmed users 9,921 edits →History: not for floating point in general, but transcendental functions ← Previous edit		Latest revision as of 15:17, 2 April 2025 edit undo Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,222 edits m punctuation and typography
(16 intermediate revisions by 11 users not shown)
Line 1: {{short description\|Part of a computer system}} [[File:X87 FPUs.jpg\|Collection of the [[x87]] family of math coprocessors by [[Intel]]\|thumb\|upright=1]] A '''floating-point unit''' ('''FPU'''), '''numeric processing unit''' ('''NPU'''),<ref>{{cite web \|url=https://www.computinghistory.org.uk/det/35216/Intel-80287XL-Numeric-Processing-Unit/ \|title=Intel 80287XL Numeric Processing Unit \|website=computinghistory.org.uk \|access-date=2024-11-02}}</ref> colloquially a '''math coprocessor'''), is a part of a [[computer]] system specially designed to carry out operations on [[Floating-point arithmetic\|floating-point]] numbers.<ref>{{Cite journal \|author-last1=Anderson \|author-first1=Stanley F. \|author-last2=Earle \|author-first2=John G. \|author-last3=Goldschmidt \|author-first3=Robert Elliott \|author-last4=Powers \|author-first4=Don M. \|date=January 1967 \|title=The IBM System/360 Model 91: Floating-Point Execution Unit \|journal=[[IBM Journal of Research and Development]] \|volume=11 \|issue=1 \|pages=34–53 \|doi=10.1147/rd.111.0034 \|issn=0018-8646}}</ref> Typical operations are [[addition]], [[subtraction]], [[multiplication]], [[division (mathematics)\|division]], and [[square root]]. Modern designs generally include a [[fused multiply-add]] instruction, which was found to be very common in real-world code. Some FPUs can also perform various [[transcendental function]]s such as [[Exponential function\|exponential]] or [[trigonometric]] calculations, but the accuracy can be low,<ref>{{cite web \|author=Dawson \|first=Bruce \|date=2014-10-09 \|title=Intel Underestimates Error Bounds by 1.3 quintillion \|url=https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/ \|access-date=2020-01-16 \|website=randomascii.wordpress.com}}</ref><ref>{{cite web \|url=https://software.intel.com/en-us/blogs/2014/10/09/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software \|title=FSIN Documentation Improvements in the "Intel® 64 and IA-32 Architectures Software Developer's Manual" \|website=intel.com \| date=2014-10-09 \|access-date=2020-01-16 \|archive-url=https://web.archive.org/web/20200116083121/https://software.intel.com/en-us/blogs/2014/10/09/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software \|archive-date=2020-01-16 \|url-status=dead}}</ref> so some systems prefer to compute these functions in software.▼ Floating-point operations were originally handled in [[software]] in early computers. Over time, manufacturers began to provide standardized floating-point libraries as part of their software collections. Some machines, those dedicated to scientific processing, would include specialized hardware to perform some of these tasks with much greater speed. The introduction of [[microcode]] in the 1960s allowed these instructions to be included in the system's [[instruction set architecture]] (ISA). Normally these would be decoded by the microcode into a series of instructions that were similar to the libraries, but on those machines with an FPU, they would instead be routed to that unit, which would perform them much faster. This allowed floating-point instructions to become universal while the floating-point hardware remained optional; for instance, on the [[PDP-11]] one could add the floating-point processor unit at any time using plug-in [[expansion card]]s. ▲A '''floating-point unit''' ('''FPU''', colloquially a '''math coprocessor''') is a part of a [[computer]] system specially designed to carry out operations on [[Floating-point arithmetic\|floating-point]] numbers.<ref>{{Cite journal \|author-last1=Anderson \|author-first1=Stanley F. \|author-last2=Earle \|author-first2=John G. \|author-last3=Goldschmidt \|author-first3=Robert Elliott \|author-last4=Powers \|author-first4=Don M. \|date=January 1967 \|title=The IBM System/360 Model 91: Floating-Point Execution Unit \|journal=[[IBM Journal of Research and Development]] \|volume=11 \|issue=1 \|pages=34–53 \|doi=10.1147/rd.111.0034 \|issn=0018-8646}}</ref> Typical operations are [[addition]], [[subtraction]], [[multiplication]], [[division (mathematics)\|division]], and [[square root]]. Some FPUs can also perform various [[transcendental function]]s such as [[Exponential function\|exponential]] or [[trigonometric]] calculations, but the accuracy can be low,<ref>{{cite web \|author=Dawson \|first=Bruce \|date=2014-10-09 \|title=Intel Underestimates Error Bounds by 1.3 quintillion \|url=https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/ \|access-date=2020-01-16 \|website=randomascii.wordpress.com}}</ref><ref>{{cite web \|url=https://software.intel.com/en-us/blogs/2014/10/09/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software \|title=FSIN Documentation Improvements in the "Intel® 64 and IA-32 Architectures Software Developer's Manual" \|website=intel.com \| date=2014-10-09 \|access-date=2020-01-16 \|archive-url=https://web.archive.org/web/20200116083121/https://software.intel.com/en-us/blogs/2014/10/09/fsin-documentation-improvements-in-the-intel-64-and-ia-32-architectures-software \|archive-date=2020-01-16 \|url-status=dead}}</ref> so some systems prefer to compute these functions in software. The introduction of the [[microprocessor]] in the 1970s led to a similar evolution as the earlier [[mainframe]]s and [[minicomputer]]s. Early [[microcomputer]] systems performed floating point in software, typically in a vendor-specific library included in [[ROM]]. Dedicated single-chip FPUs began to appear late in the decade, but they remained rare in real-world systems until the mid-1980s, and using them required software to be re-written to call them. As they became more common, the software libraries were modified to work like the microcode of earlier machines, performing the instructions on the main CPU if needed, but offloading them to the FPU if one was present. By the late 1980s, [[semiconductor manufacturing]] had improved to the point where it became possible to include an FPU with the main CPU, resulting in designs like the [[i486]] and [[68040]]. These designs were known as an "integrated FPU"s, and from the mid-1990s, FPUs were a standard feature of most CPU designs except those designed as low-cost as [[embedded processor]]s. In general-purpose [[computer architecture]]s, one or more FPUs may be integrated as [[execution unit]]s within the [[central processing unit]]; however, many [[embedded processor]]s do not have hardware support for floating-point operations (while they increasingly have them as standard). In modern designs, a single CPU will typically include several [[arithmetic logic unit]]s (ALUs) and several FPUs, reading many instructions at the same time and routing them to the various units for parallel execution. By the 2000s, even embedded processors generally included an FPU as well. ~~When a CPU is executing a program that calls for a floating-point operation, there are three ways to carry it out:~~ * A floating-point unit emulator (a floating-point library in software) * Add-on FPU hardware * Integrated FPU (in hardware) == History == Line 19 ⟶ 17: In 1963, the [[GE-200 series\|GE-235]] featured an "Auxiliary Arithmetic Unit" for floating point and double-precision calculations.<ref>{{cite web \|title=GE-2xx documents \|url=http://www.bitsavers.org/pdf/ge/GE-2xx/ \|website=www.bitsavers.org\|at=[http://www.bitsavers.org/pdf/ge/GE-2xx/CPB-267_GE-235-SystemManual_1963.pdf CPB-267_GE-235-SystemManual_1963.pdf], p. IV-4}}</ref> Historically, some systems implemented [[Floating-point arithmetic\|floating point]] with a [[coprocessor]] rather than as an integrated unit (but now in addition to the CPU, e.g. [[graphics processing unit\|GPUs]]{{snd}}that are coprocessors not always built into the CPU{{snd}}have FPUs as a rule, while first generations of GPUs did not). This could be a single [[integrated circuit]], an entire [[Printed circuit board\|circuit board]] or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids the cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be [[Emulator\|emulated]] by a library of software functions; this may permit the same [[object code]] to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as [[microcode]], as an [[operating system]] function, or in [[user-space]] code. When only integer functionality is available, the [[CORDIC]] methods are most commonly used for [[transcendental function]] evaluation.{{Citation needed\|reason=When a fast integer multiply is available, this can be surprising.\|date=November 2023}} In most modern computer architectures, there is some division of floating-point operations from [[integer]] operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like [[x86\|Intel x86]], go as far as independent [[computer clock\|clocking]] schemes.<ref>{{Cite web \|url=http://www.cpu-world.com/CPUs/80287/index.html \|title=Intel 80287 family \|website=www.cpu-world.com \|access-date=2019-01-15}}</ref> Line 27 ⟶ 25: Floating-point operations are often [[instruction pipelining\|pipelined]]. In earlier [[superscalar]] architectures without general [[out-of-order execution]], floating-point operations were sometimes pipelined separately from integer operations. The modular architecture of [[Bulldozer (microarchitecture)\|Bulldozer microarchitecture]] uses a special FPU named FlexFPU, which uses [[simultaneous multithreading]]. Each physical integer core, two per module, is single-threaded, in contrast with Intel's [[Hyperthreading]], where two virtual simultaneous threads share the resources of a single physical core.<ref>{{cite web \|url=http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg \|title=~~Archived~~AMD ~~copy~~Steamroller vs Bulldozer \|website=~~cdn3.wccftech.com~~WCCFtech \|access-date=14 March 2022 \|archive-url=https://web.archive.org/web/20150509204809/http://cdn3.wccftech.com/wp-content/uploads/2013/07/AMD-Steamroller-vs-Bulldozer.jpg \|archive-date=9 May 2015 \|url-status=dead}}</ref><ref>{{cite web \|url=https://www.bit-tech.net/news/hardware/2010/10/28/amd-unveils-flex-fp/1 \|date=28 October 2010 \|first1=Gareth \|last1=Halfacree \|title=AMD unveils Flex FP \|website=bit-tech.net \|access-date=29 March 2018 \|url-status=unfit \|archive-url=https://web.archive.org/web/20170322014910/https://www.bit-tech.net/news/hardware/2010/10/28/amd-unveils-flex-fp/1 \|archive-date= Mar 22, 2017 }}</ref> == {{anchor\|Floating-point emulation}}Floating-point library == Line 46 ⟶ 44: == Add-on FPUs == {{Main\|Coprocessor}} Several models of the [[PDP-11]], such as the PDP-11/45,<ref>{{cite book\|url=http://bitsavers.org/pdf/dec/pdp11/handbooks/PDP1145_Handbook_1973.pdf\|title=PDP-11/45 Processor Handbook\|at=Chapter 7 "Floating Point Processor"\|date=1973\|publisher=[[Digital Equipment Corporation]]}}</ref> PDP-11/34a,<ref name="1979-pdp-11-handbook">{{cite book\|url=http://bitsavers.org/pdf/dec/pdp11/handbooks/PDP11_Handbook1979.pdf\|title=PDP-11 Processor Handbook\|date=1979\|publisher=[[Digital Equipment Corporation]]}}</ref>{{rp\|pages=~~184-185~~184–185}} PDP-11/44,<ref name="1979-pdp-11-handbook" />{{rp\|pages=195,211}} and PDP-11/70,<ref name="1979-pdp-11-handbook" />{{rp\|pages=277,~~286-287~~286–287}} supported an add-on floating-point unit to support floating-point instructions. The PDP-11/60,<ref name="1979-pdp-11-handbook" />{{rp\|page=261}} MicroPDP-11/23<ref name="micro-PDP-11-handbook">{{cite book\|url=http://bitsavers.org/pdf/dec/pdp11/handbooks/EB-24944-18_Micro_PDP-11_Handbook_1983-84.pdf\|title=MICRO/PDP-11 Handbook\|page=33\|date=1983\|publisher=[[Digital Equipment Corporation]]}}</ref> and several [[VAX]] models<ref>{{cite book \|url=http://bitsavers.org/pdf/dec/vax/handbook/VAX_Hardware_Handbook_Volume_1_1986.pdf \|title=VAX – Hardware Handbook Volume I – 1986 \|date=1985 \|publisher=[[Digital Equipment Corporation]] \|language=en-us}}</ref><ref>{{cite book \|url=http://bitsavers.org/pdf/dec/vax/handbook/VAX_Hardware_Handbook_Volume_2_1986.pdf \|title=VAX – Hardware Handbook Volume II – 1986 \|date=1986 \|publisher=[[Digital Equipment Corporation]] \|language=en-us}}</ref> could execute floating-point instructions without an add-on FPU (the MicroPDP-11/23 required an add-on microcode option),<ref name="micro-PDP-11-handbook" /> and offered add-on accelerators to further speed the execution of those instructions. In the 1980s, it was common in [[IBM PC]]/compatible [[microcomputers]] for the FPU to be entirely separate from the [[Central processing unit\|CPU]], and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs. The IBM PC, [[IBM Personal Computer XT\|XT]], and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The [[IBM Personal Computer/AT\|AT]] and [[Intel 80286\|80286]]-based systems were generally socketed for the [[x87#80287\|80287]], and [[Intel 80386\|80386/80386SX]]-based machines{{snd}}for the [[x87#80387\|80387]] and [[Intel 80387SX\|80387SX]] respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included [[Cyrix]] and [[Weitek]]. [[Acorn Computers]] opted for the WE32206 to offer [[Single-precision floating-point format\|single]], [[Double-precision floating-point format\|double]] and [[extended precision]]<ref>{{cite web \|title=Western Electric 32206 co-processor \|url=https://www.cpu-world.com/CPUs/32206/index.html \|website=www.cpu-world.com \| access-date=2021-11-06}}</ref> to its [[ARM architecture\|ARM]] powered [[Acorn Archimedes\|Archimedes]] range, introducing a gate array to interface the ARM2 processor with the WE32206 to support the additional ARM floating-point instructions.<ref name="abcomputing199003_arm">{{ cite magazine \| title=Programming The ARM: The Floating Point Co-processor \| magazine=A&B Computing \| last1=Fellows \| first1=Paul \| date=March 1990 \| pages=43–44 }}</ref> Acorn later offered the FPA10 coprocessor, developed by ARM, for various machines fitted with the ARM3 processor.<ref name="acorn_fpa10">{{ cite press release \| url=http://chrisacorns.computinghistory.org.uk/docs/Acorn/PR/FPA_release.txt \| title=Acorn Releases Floating Point Accelerator \| publisher=Acorn Computers Limited \| date=5 July 1993 \| access-date=7 April 2021 }}</ref> Coprocessors were available for the [[Motorola 68000 series\|Motorola 68000 family]], the [[Motorola 68881\|68881 and 68882]]. These were common in [[Motorola 68020]]/[[Motorola 68030\|68030]]-based [[workstation]]s, like the [[Sun-3]] series. They were also commonly added to higher-end models of Apple [[Macintosh]] and Commodore [[Amiga]] series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower-end systems.