Revision as of 20:53, 15 June 2025 edit WaterQuark (talk \| contribs) 283 edits m Fix capitalization ("Planck" is a proper name) Tag: Visual edit ← Previous edit		Revision as of 19:33, 10 July 2025 edit undo 208.114.63.4 (talk) No edit summary Next edit →
Line 9: == IEEE 754 octuple-precision binary floating-point format: binary256 == In its 2008 revision, the [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having: * [[Sign bit]]: 1 bit Line 23 ⟶ 22: === Exponent encoding === The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard. Line 34 ⟶ 32: The stored exponents 00000<sub>16</sub> and 7FFFF<sub>16</sub> are interpreted specially. {\| class="wikitable" style="text-align: center;" \|- ! Exponent !! Significand zero !! Significand non-zero !! Equation \|- Line 49 ⟶ 48: === Octuple-precision examples === These examples are given in bit ''representation'', in [[hexadecimal]], of the floating-point value. This includes the sign, (biased) exponent, and significand. <pre<includeonly></includeonly> 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +0▼ ~~8000~~0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −0+0 ▲ ~~0000~~8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +0−0 </pre> <pre<includeonly></includeonly> 7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = +infinity▼ ~~ffff~~7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = ~~−infinity~~+infinity ▲ ~~7fff~~ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = ~~+infinity~~−infinity </pre> <pre<includeonly></includeonly> 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> ~~= 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup>~~ = 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup> ≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10<sup>−78984</sup> (smallest positive subnormal number) </pre> <pre<includeonly></includeonly> 0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>)▼ = 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>) ≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10<sup>−78913</sup> (largest subnormal number) </pre> <pre<includeonly></includeonly> 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−262142</sup>▼ = 2<sup>−262142</sup> ≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10<sup>−78913</sup> (smallest positive normal number) </pre> <pre<includeonly></includeonly> 7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> ~~= 2<sup>262143</sup> × (2 − 2<sup>−236</sup>)~~ = 2<sup>262143</sup> × (2 − 2<sup>−236</sup>) ≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 10<sup>78913</sup> (largest normal number) </pre> <pre<includeonly></includeonly> 3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> ~~= 1 − 2<sup>−237</sup>~~ ▲ = 1 − 2<sup>~~−262142~~−237</sup> ≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472 (largest number less than one) </pre> <pre<includeonly></includeonly> 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one) </pre> <pre<includeonly></includeonly> 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> ~~= 1 + 2<sup>−236</sup>~~ ▲ = ~~2<sup>−262142</sup> × (~~1 −+ 2<sup>−236</sup>) ≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906 (smallest number larger than one) </pre> By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand. So the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]]. == Implementations == Octuple precision is rarely implemented since usage of it is extremely rare. [[Apple Inc.]] had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit [[two's complement]] significand and a 32-bit exponent.<ref name="Crandall-Papadopoulos_2002"/> One can use general [[arbitrary-precision arithmetic]] libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance. === Hardware support === There is no known hardware ~~implementation~~with ofnative support for octuple precision. == See also == Line 112 ⟶ 128: }} == Further reading == * {{cite book \|author-first=Nelson H. F. \|author-last=Beebe \|title=The Mathematical-Function Computation Handbook - Programming Using the MathCW Portable Software Library \|date=2017-08-22 \|___location=Salt Lake City, UT, USA \|publisher=[[Springer International Publishing AG]] \|edition=1 \|lccn=2017947446 \|isbn=978-3-319-64109-6 \|doi=10.1007/978-3-319-64110-2 }}

Octuple-precision floating-point format: Difference between revisions