Octuple-precision floating-point format: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 16:00, 16 May 2015 edit Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits →Computer-language support: Removed a meaningless sentence. ← Previous edit		Latest revision as of 12:51, 3 August 2025 edit undo C.Fred (talk \| contribs) Autopatrolled, Administrators 282,224 edits m Reverted edit by 2A01:CB16:201A:C683:0:67:3AB1:9A01 (talk) to last version by Vincent Lefèvre Tag: Rollback
(91 intermediate revisions by 46 users not shown)
Line 1: {{short description\|256-bit computer number format}} ~~{{Multiple issues\|~~ {{~~Refimprove~~use dmy dates\|date=~~May~~December ~~2015~~2022\|cs1-dates=y}} {{~~Cleanup~~use list-defined references\|date=~~April~~December ~~2009~~2022}} }}▼ In [[computing]], '''octuple precision''' is a binary [[floating-point]]-based [[computer number format]] that occupies [[32]] [[byte]]s ([[256]] [[bit]]s or [[64]] [[nibble]]s) in computer memory. This [[256]]-[[bit]] octuple precision is for applications requiring results in higher than [[quadruple precision]]. This format is rarely (if ever) used and very few things support it.▼ {{Floating-point}} {{Computer architecture bit widths}} ▲In [[computing]], '''octuple precision''' is a binary [[floating-point]]-based [[computer number format]] that occupies [[32]] [[byte]]s ([[256]] [[bit~~]]s or [[64]] [[nibble~~]]s) in computer memory. This [[256]]-[[bit]] octuple precision is for applications requiring results in higher than [[quadruple precision]]. ~~This format is rarely (if ever) used and very few things support it.~~ The range greatly exceeds what is needed to describe all known physical limitations within the observable universe or precisions better than [[Planck units]]. == IEEE 754 octuple-precision binary floating-point format: binary256 == ~~The~~In its 2008 revision, the [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having:▼ ▲The [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having: * [[Sign bit]]: 1 bit * [[Exponent]] width: 19 bits Line 16: The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the [[significand]] appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: {{nowrap\|log<sub>10</sub>(2<sup>237</sup>) ≈ 71.344}}). <!-- (Commented out since the image is incorrect; it could be re-added once corrected.)--> The bits are laid out as follows: [[File:Octuple precision visual ~~demontration~~demonstration.~~png~~svg\|1000px\|~~Octuple~~Layout of octuple-precision ~~visual~~floating-point ~~demonstration~~format]] ~~-->~~ === Exponent encoding === The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE  754 standard.▼ ▲The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard. * E<sub>min</sub> = −262142 Line 30 ⟶ 28: * [[Exponent bias]] = 3FFFF<sub>16</sub> = 262143 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of ~~16383~~262143 has to be subtracted from the stored exponent. The stored exponents 00000<sub>16</sub> and 7FFFF<sub>16</sub> are interpreted specially. {\| class="wikitable" style="text-align: center;" \|- ! Exponent !! Significand zero !! Significand non-zero !! Equation \|- \| 00000<sub>16</sub> \|\| [[0 (number)\|0]], [[−0]] \|\| [[subnormal numbers]] \|\| ~~<math>~~(-1−1)~~^{\text{~~<sup>signbit}}</sup> ~~\times~~× 2~~^{-262142}~~<sup>−262142</sup> ~~\times~~× 0.~~\text{~~significandbits~~}_2~~<sub>2</~~math~~sub> \|- \| 00001<sub>16</sub>, ..., 7FFFE<sub>16</sub> \|\|colspan=2\| normalized value \|\| ~~<math>~~(-1−1)~~^{\text{~~<sup>signbit}}</sup> ~~\times~~× 2~~^{{\text{exponentbits}_2}~~<sup>exponent -bits<sub>2</sub></sup> ~~262143} \times~~× 1.~~\text{~~significandbits~~}_2~~<sub>2</~~math~~sub> \|- \| 7FFFF<sub>16</sub> \|\| ±[[infinity\|∞]] \|\| [[NaN]] (quiet, ~~signalling~~signaling) \|} Line 49 ⟶ 48: === Octuple-precision examples === These examples are given in bit ''representation'', in [[hexadecimal]], of the floating-point value. This includes the sign, (biased) exponent, and significand. 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = +0 8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = −0 7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = +infinity ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = −infinity 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup> ≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10<sup>−78984</sup> (smallest positive subnormal number) 0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>) ≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10<sup>−78913</sup> (largest subnormal number) 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−262142</sup> ≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10<sup>−78913</sup> (smallest positive normal number) 7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>262143</sup> × (2 − 2<sup>−236</sup>) ≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 10<sup>78913</sup> (largest normal number) 3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−237</sup> ≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472 (largest number less than one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 1 + 2<sup>−236</sup> ≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906 (smallest number larger than one) By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand. So the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]]. == Implementations == Octuple precision is rarely ~~if ever~~ implemented ~~in to software~~ since usage of it is extremely rare. [[Apple Inc.]] had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit [[two's complement]] significand and a 32-bit exponent.<ref name="Crandall-Papadopoulos_2002"/> One can use general [[arbitrary-precision arithmetic]] libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance. ~~===Computer-language support===~~ ~~In C++, it is possible to write a library to handle octuple-precision floating-point arithmetic.~~ === Hardware support === There is ~~little to~~ no known hardware with native support for octuple~~-precision~~ ~~arithmetic~~precision. == See also == * [[IEEE 754]] * [[IEEE 754-2008\|IEEE Standard for Floating-Point Arithmetic (IEEE 754)]] * [[Extended precision]] * [[ISO/IEC 10967]], Language-independent arithmetic * [[Primitive data type]] * [[Scientific notation]] * [[Half-precision floating-point format]] * [[Single-precision floating-point format]] * [[Double-precision floating-point format]] * [[Quadruple-precision floating-point format]] == References == {{reflist}}\|refs= <ref name="Crandall-Papadopoulos_2002">{{cite web \|title=Octuple-precision floating point on Apple G4 (archived copy on web.archive.org) \|author-first1=Richard E. \|author-last1=Crandall \|author-link1=Richard E. Crandall \|author-first2=Jason S. \|author-last2=Papadopoulos \|date=2002-05-08 \|url=http://images.apple.com/ca/acg/pdf/oct3a.pdf \|url-status=unfit \|archive-url=https://web.archive.org/web/20060728140052/http://images.apple.com/ca/acg/pdf/oct3a.pdf \|archive-date=2006-07-28}} (8 pages)</ref> ▲}} == Further reading == * {{cite book \|author-first=Nelson H. F. \|author-last=Beebe \|title=The Mathematical-Function Computation Handbook - Programming Using the MathCW Portable Software Library \|date=2017-08-22 \|___location=Salt Lake City, UT, USA \|publisher=[[Springer International Publishing AG]] \|edition=1 \|lccn=2017947446 \|isbn=978-3-319-64109-6 \|doi=10.1007/978-3-319-64110-2 }} {{data types}} <!--This page is a slightly modified copy of the https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format page. I do not believe this is plagiarizing because it comes from Wikipedia, but if it is then please delete this page since it would be nearly impossible to fix this page. :)--> [[Category:Binary arithmetic]] [[Category:~~Data~~Floating point types]]