Octuple-precision floating-point format: Difference between revisions

Content deleted Content added
WaterQuark (talk | contribs)
m Fix capitalization ("Planck" is a proper name)
No edit summary
Line 9:
 
== IEEE 754 octuple-precision binary floating-point format: binary256 ==
 
In its 2008 revision, the [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having:
* [[Sign bit]]: 1 bit
Line 23 ⟶ 22:
 
=== Exponent encoding ===
 
The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard.
 
Line 34 ⟶ 32:
The stored exponents 00000<sub>16</sub> and 7FFFF<sub>16</sub> are interpreted specially.
 
{| class="wikitable" style="text-align: center;"
|-
! Exponent !! Significand zero !! Significand non-zero !! Equation
|-
Line 49 ⟶ 48:
 
=== Octuple-precision examples ===
 
These examples are given in bit ''representation'', in [[hexadecimal]],
of the floating-point value. This includes the sign, (biased) exponent, and significand.
 
<pre<includeonly></includeonly>
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +0
80000000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −0+0
00008000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +0−0
</pre>
 
<pre<includeonly></includeonly>
7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +infinity
ffff7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −infinity+infinity
7fffffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +infinity−infinity
</pre>
 
<pre<includeonly></includeonly>
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub>
= 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup>
= 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup>
≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10<sup>−78984</sup>
(smallest positive subnormal number)
</pre>
 
<pre<includeonly></includeonly>
0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>)
= 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>)
≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10<sup>−78913</sup>
(largest subnormal number)
</pre>
 
<pre<includeonly></includeonly>
0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub>
= 2<sup>−262142</sup>
= 2<sup>−262142</sup>
≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10<sup>−78913</sup>
(smallest positive normal number)
</pre>
 
<pre<includeonly></includeonly>
7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 2<sup>262143</sup> × (2 − 2<sup>−236</sup>)
= 2<sup>262143</sup> × (2 − 2<sup>−236</sup>)
≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 10<sup>78913</sup>
(largest normal number)
</pre>
 
<pre<includeonly></includeonly>
3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 1 − 2<sup>−237</sup>
= 1 − 2<sup>−262142−237</sup>
≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472
(largest number less than one)
</pre>
 
<pre<includeonly></includeonly>
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub>
= 1 (one)
</pre>
 
<pre<includeonly></includeonly>
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub>
= 1 + 2<sup>−236</sup>
= 2<sup>−262142</sup> × (1 + 2<sup>−236</sup>)
≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906
(smallest number larger than one)
</pre>
 
By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand.
So the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]].
 
== Implementations ==
Octuple precision is rarely implemented since usage of it is extremely rare. [[Apple Inc.]] had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit [[two's complement]] significand and a 32-bit exponent.<ref name="Crandall-Papadopoulos_2002"/> One can use general [[arbitrary-precision arithmetic]] libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance.
 
=== Hardware support ===
There is no known hardware implementationwith ofnative support for octuple precision.
 
== See also ==
Line 112 ⟶ 128:
}}
 
== Further reading ==
* {{cite book |author-first=Nelson H. F. |author-last=Beebe |title=The Mathematical-Function Computation Handbook - Programming Using the MathCW Portable Software Library |date=2017-08-22 |___location=Salt Lake City, UT, USA |publisher=[[Springer International Publishing AG]] |edition=1 |lccn=2017947446 |isbn=978-3-319-64109-6 |doi=10.1007/978-3-319-64110-2 }}