Octuple-precision floating-point format: Difference between revisions

Content deleted Content added
Implementations: It's most natural to say it this way
Pjacklam (talk | contribs)
Octuple-precision examples: Add examples similar to the pages for other IEEE 754 encodings
Line 48:
of the floating-point value. This includes the sign, (biased) exponent, and significand.
 
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = +0
8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = −0
 
7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = +infinity
ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = −infinity
 
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup>
≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 2<sup>−78984</sup>
(smallest positive subnormal number)
 
0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>)
≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 2<sup>−78913</sup>
(largest subnormal number)
 
0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub>
= 2<sup>−262142</sup>
≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 2<sup>−78913</sup>
(smallest positive normal number)
 
7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 2<sup>262143</sup> × (2 − 2<sup>−236</sup>)
≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 2<sup>78913</sup>
(largest normal number)
 
3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub>
= 1 − 2<sup>−237</sup>
≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472
(largest number less than one)
 
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub>
= 1 (one)
 
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub>
= 1 + 2<sup>−236</sup>
≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906
(smallest number larger than one)
 
By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand.