Octuple-precision floating-point format: Difference between revisions

Content deleted Content added
See also: Extended precision: more general than 80-bit. Removed "long double" since there are no cross references between the articles.
IEEE 754 octuple-precision binary floating-point format: binary256: various corrections (there was confusion with quadruple precision!) - Please check!
Line 9:
== IEEE 754 octuple-precision binary floating-point format: binary256 ==
 
The [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having:
The IEEE 754 standard specifies a '''binary256''' as having:{{dubious|reason=No references found. Some descriptions (e.g. [http://www.researchgate.net/publication/224087006_A_system_on_the_Web_for_octuple-precision_computation]) mention octuple precision with other sizes, such as a 15-bit exponent and 240-bit explicit mantissa|date=May 2015}}
* [[Sign bit]]: 1 bit
* [[Exponent]] width: 1819 bits
* [[Significand]] [[precision (arithmetic)|precision]]: 238237 bits (237236 explicitly stored)
<!-- "significand", with a d at the end, is a technical term, please do not confuse with "significant" -->
 
The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 112236 bits of the [[significand]] appear in the memory format, but the total precision is 113237 bits (approximately 3471 decimal digits: {{nowrap|log<sub>10</sub>(2<sup>113237</sup>) ≈ 3471.016344}}). The bits are laid out as follows:
This gives from 33 to 36 significant decimal digits' precision. (If a decimal string with at most 33 significant decimal digits is converted to IEEE 754 octuple precision and then converted back to the same number of significant decimal digits, then the final string should match the original; and if an IEEE 754 octuple precision is converted to a decimal string with at least 36 significant decimal and then converted back to octuple precision, then the final number must match the original.<ref name=whyieee>{{cite web|url=http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF|title=Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic| author=William Kahan |date=1 October 1987}}</ref>)
<!-- (Commented out since the image is incorrect; it could be re-added once corrected.)
The bits are laid out as follows:
 
[[File:Octuple persisionprecision visual demontration.png|1000px|Octuple precision visual demonstration]]
The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 112 bits of the [[significand]] appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits: {{nowrap|log<sub>10</sub>(2<sup>113</sup>) ≈ 34.016}}). The bits are laid out as follows:
-->
 
[[File:Octuple persision visual demontration.png|1000px|Octuple precision visual demonstration]]
 
=== Exponent encoding ===
Line 25 ⟶ 26:
The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard.
 
* E<sub>min</sub> = −262143−262142
* E<sub>max</sub> = 262143
* [[Exponent bias]] = 3FFF3FFFF<sub>16</sub> = 16383262143
 
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent.
 
The stored exponents 000000000<sub>16</sub> and 7FFF7FFFF<sub>16</sub> are interpreted specially.
 
{|class="wikitable" style="text-align:center"
! Exponent !! Significand zero !! Significand non-zero !! Equation
|-
| 000000000<sub>16</sub> || [[0 (number)|0]], [[−0]] || [[subnormal numbers]] || <math>(-1)^{\text{signbit}} \times 2^{-16382262142} \times 0.\text{significandbits}_2</math>
|-
| 000100001<sub>16</sub>, ..., 7FFE7FFFE<sub>16</sub> ||colspan=2| normalized value || <math>(-1)^{\text{signbit}} \times 2^{{\text{exponentbits}_2} - 16383262143} \times 1.\text{significandbits}_2</math>
|-
| 7FFF7FFFF<sub>16</sub> || ±[[infinity|∞]] || [[NaN]] (quiet, signalling)
|}
 
The minimum strictly positive (subnormal) value is {{nowrap|2<sup>−16494−262378</sup> ≈ 10<sup>−4965−78984</sup>}} and has a precision of only one bit.
The minimum positive normal value is 2<sup>−16382−262142</sup> ≈ 32.36214824 × 10<sup>−4932−78913</sup> and has a precision of 112&nbsp;bits, i.e. ±2&thinsp;<sup>−16494</sup> as well.
The maximum representable value is 2<sup>16384262144</sup> − 2<sup>16272261907</sup> ≈ 1.18976113 × 10<sup>493278913</sup>.
 
=== Octuple-precision examples ===
Line 55 ⟶ 56:
8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = −0
 
7fff 0000f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = +infinity
ffff 0000f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = −infinity
 
By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand.