Octuple-precision floating-point format: Difference between revisions

Content deleted Content added
Undid revision 666079135 by Quondum (talk) because i fixed pic's info and grammar and i didn't turn this page into a blog
m Reverted edit by 2A01:CB16:201A:C683:0:67:3AB1:9A01 (talk) to last version by Vincent Lefèvre
 
(81 intermediate revisions by 45 users not shown)
Line 1:
{{short description|256-bit computer number format}}
{{Multiple issues|
{{Refimproveuse dmy dates|date=MayDecember 20152022|cs1-dates=y}}
{{Cleanupuse list-defined references|date=AprilDecember 20092022}}
}}
 
In [[computing]], '''octuple precision''' is a binary [[floating-point]]-based [[computer number format]] that occupies [[32]] [[byte]]s ([[256]] [[bit]]s) in computer memory. This [[256]]-[[bit]] octuple precision is for applications requiring results in higher than [[quadruple precision]]. This format is rarely (if ever) used and very few things support it.
{{Floating-point}}
{{Computer architecture bit widths}}
In [[computing]], '''octuple precision''' is a binary [[floating-point]]-based [[computer number format]] that occupies [[32]] [[byte]]s ([[256]] [[bit]]s) in computer memory. This [[256]]-[[bit]] octuple precision is for applications requiring results in higher than [[quadruple precision]]. This format is rarely (if ever) used and very few things support it.
 
The range greatly exceeds what is needed to describe all known physical limitations within the observable universe or precisions better than [[Planck units]].
 
== IEEE 754 octuple-precision binary floating-point format: binary256 ==
In its 2008 revision, the [[IEEE 754]] standard specifies a '''binary 256binary256''' format among the ''interchange formats'' (it is not a basic format), as having:
 
In its 2008 revision, the [[IEEE 754]] standard specifies a '''binary 256''' format among the ''interchange formats'' (it is not a basic format), as having:
* [[Sign bit]]: 1 bit
* [[Exponent]] width: 19 bits
Line 17:
The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the [[significand]] appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: {{nowrap|log<sub>10</sub>(2<sup>237</sup>) ≈ 71.344}}).
<!-- (Commented out since the image is incorrect; it could be re-added once corrected.)-->
The bits are laid out as follows:
The bits are laid out as follows (as you can see, it is so super-massive that it goes off the page, thus this visually demonstrates that common implementation of this would produce lots of lag):
 
[[File:Octuple persision visual demontration.png|Octuple persision visual demontration]]
 
[[File:Octuple precision visual demonstration.svg|1000px|Layout of octuple-precision floating-point format]]
 
=== Exponent encoding ===
The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE &nbsp;754 standard.
 
The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard.
 
* E<sub>min</sub> = −262142
Line 30 ⟶ 28:
* [[Exponent bias]] = 3FFFF<sub>16</sub> = 262143
 
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383262143 has to be subtracted from the stored exponent.
 
The stored exponents 00000<sub>16</sub> and 7FFFF<sub>16</sub> are interpreted specially.
 
{| class="wikitable" style="text-align: center;"
|-
! Exponent !! Significand zero !! Significand non-zero !! Equation
|-
| 00000<sub>16</sub> || [[0 (number)|0]], [[−0]] || [[subnormal numbers]] || <math>(-1−1)^{\text{<sup>signbit}}</sup> \times× 2^{-262142}<sup>−262142</sup> \times× 0.\text{significandbits}_2<sub>2</mathsub>
|-
| 00001<sub>16</sub>, ..., 7FFFE<sub>16</sub> ||colspan=2| normalized value || <math>(-1−1)^{\text{<sup>signbit}}</sup> \times× 2^{{\text{exponentbits}_2}<sup>exponent -bits<sub>2</sub></sup> 262143} \times× 1.\text{significandbits}_2<sub>2</mathsub>
|-
| 7FFFF<sub>16</sub> || ±[[infinity|∞]] || [[NaN]] (quiet, signallingsignaling)
|}
 
Line 49 ⟶ 48:
 
=== Octuple-precision examples ===
 
These examples are given in bit ''representation'', in [[hexadecimal]],
of the floating-point value. This includes the sign, (biased) exponent, and significand.
 
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = +0
8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = −0
 
7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = +infinity
ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = −infinity
 
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub>
= 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup>
≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10<sup>−78984</sup>
(smallest positive subnormal number)
 
0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>)
≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10<sup>−78913</sup>
(largest subnormal number)
 
0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub>
= 2<sup>−262142</sup>
≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10<sup>−78913</sup>
(smallest positive normal number)
 
7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 2<sup>262143</sup> × (2 − 2<sup>−236</sup>)
≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 10<sup>78913</sup>
(largest normal number)
 
3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 1 − 2<sup>−237</sup>
≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472
(largest number less than one)
 
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub>
= 1 (one)
 
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub>
= 1 + 2<sup>−236</sup>
≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906
(smallest number larger than one)
 
By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand.
So the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]].
 
== Implementations ==
Octuple precision is rarely implemented since usage of it is extremely rare. [[Apple Inc.]] had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit [[two's complement]] significand and a 32-bit exponent.<ref>{{cite web | urlname=http://images.apple.com/ca/acg/pdf/oct3a.pdf | title=Octuple"Crandall-precision floating point on Apple G4 | author1=R. Crandall | author2=J. Papadopoulos | date=8 May 2002}}<Papadopoulos_2002"/ref> One can use general [[arbitrary-precision arithmetic]] libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance.
 
=== Hardware support ===
There is littleno to noknown hardware with native support for octuple- precision arithmetic. The requirements in-order to transport this piece of data are as follows.
* [[8-bit|8-bit architecture]] (statistical extrapolation since it would take 1/8 of the entire memory just to store 1 octuple precision numeral so it would be impractical) - 32 separate packages of information (at least) in order to transport this across the main data bus
* [[16-bit|x16 architecture]] - 16 separate packages of information (at least) in order to transport this across the main data bus
* [[x86|x86 architecture]] - 8 separate packages of information (at least) in order to transport this across the main data bus
* [[x86-64|x64 architecture]] - 4 separate packages of information (at least) in order to transport this across the main data bus
<br />
So, usage of this on modern architecture computers would create tremendous lag compared to other precision arithmetic.
 
== See also ==
* [[IEEE 754]]
* [[IEEE 754-2008|IEEE Standard for Floating-Point Arithmetic (IEEE 754)]]
* [[Extended precision]]
* [[ISO/IEC 10967]], Language-independent arithmetic
* [[Primitive data type]]
* [[Scientific notation]]
* [[Half-precision floating-point format]]
* [[Single-precision floating-point format]]
* [[Double-precision floating-point format]]
* [[Quadruple-precision floating-point format]]
 
== References ==
{{reflist}}|refs=
<ref name="Crandall-Papadopoulos_2002">{{cite web |title=Octuple-precision floating point on Apple G4 (archived copy on web.archive.org) |author-first1=Richard E. |author-last1=Crandall |author-link1=Richard E. Crandall |author-first2=Jason S. |author-last2=Papadopoulos |date=2002-05-08 |url=http://images.apple.com/ca/acg/pdf/oct3a.pdf |url-status=unfit |archive-url=https://web.archive.org/web/20060728140052/http://images.apple.com/ca/acg/pdf/oct3a.pdf |archive-date=2006-07-28}} (8 pages)</ref>
}}
 
== Further reading ==
* {{cite book |author-first=Nelson H. F. |author-last=Beebe |title=The Mathematical-Function Computation Handbook - Programming Using the MathCW Portable Software Library |date=2017-08-22 |___location=Salt Lake City, UT, USA |publisher=[[Springer International Publishing AG]] |edition=1 |lccn=2017947446 |isbn=978-3-319-64109-6 |doi=10.1007/978-3-319-64110-2 }}
 
{{data types}}
<!--This page is a slightly modified copy of the https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format page. I do not believe this is plagiarizing because it comes from Wikipedia, but if it is then please delete this page since it would be near impossible to fix this page and it would be easier to just recreate the entire thing. :)-->
 
[[Category:Binary arithmetic]]
[[Category:DataFloating point types]]