Quadruple-precision floating-point format: Difference between revisions

Content deleted Content added
Undid revision 1284570553 by MrSwedishMeatballs (talk) The double-precision format has 64 bits in total, but only a precision of 53 bits.
Line 101:
 
* As the magnitude of the value decreases, the amount of extra precision also decreases. Therefore, the smallest number in the normalized range is narrower than double precision. The smallest number with full precision is {{nowrap|1000...0<sub>2</sub> (106 zeros) × 2<sup>−1074</sup>}}, or {{nowrap|1.000...0<sub>2</sub> (106 zeros) × 2<sup>−968</sup>}}. Numbers whose magnitude is smaller than 2<sup>−1021</sup> will not have additional precision compared with double precision.
* The actual number of bits of precision can vary. In general, the magnitude of the low-order part of the number is no greater than a half [[Unit in the last place|ULP]] of the high-order part. If the low-order part is less than half ULP of the high-order part, significant bits (either all 0s or all 1s) are implied between the significantsignificand of the high-order and low-order numbers. Certain algorithms that rely on having a fixed number of bits in the significand can fail when using 128-bit long double numbers.
* Because of the reason above, it is possible to represent values like {{nowrap|1 + 2<sup>−1074</sup>}}, which is the smallest representable number greater than 1.