Revision as of 15:14, 16 May 2015 edit Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits →See also: Extended precision: more general than 80-bit. Removed "long double" since there are no cross references between the articles. ← Previous edit		Revision as of 15:51, 16 May 2015 edit undo Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits →IEEE 754 octuple-precision binary floating-point format: binary256: various corrections (there was confusion with quadruple precision!) - Please check! Next edit →
Line 9: == IEEE 754 octuple-precision binary floating-point format: binary256 == The [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having: The IEEE 754 standard specifies a '''binary256''' as having:{{dubious\|reason=No references found. Some descriptions (e.g. [http://www.researchgate.net/publication/224087006_A_system_on_the_Web_for_octuple-precision_computation]) mention octuple precision with other sizes, such as a 15-bit exponent and 240-bit explicit mantissa\|date=May 2015}} * [[Sign bit]]: 1 bit * [[Exponent]] width: 1819 bits * [[Significand]] [[precision (arithmetic)\|precision]]: ~~238~~237 bits (~~237~~236 explicitly stored) <!-- "significand", with a d at the end, is a technical term, please do not confuse with "significant" --> The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only ~~112~~236 bits of the [[significand]] appear in the memory format, but the total precision is ~~113~~237 bits (approximately 3471 decimal digits: {{nowrap\|log<sub>10</sub>(2<sup>~~113~~237</sup>) ≈ 3471.~~016~~344}}). ~~The bits are laid out as follows:~~▼ This gives from 33 to 36 significant decimal digits' precision. (If a decimal string with at most 33 significant decimal digits is converted to IEEE 754 octuple precision and then converted back to the same number of significant decimal digits, then the final string should match the original; and if an IEEE 754 octuple precision is converted to a decimal string with at least 36 significant decimal and then converted back to octuple precision, then the final number must match the original.<ref name=whyieee>{{cite web\|url=http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF\|title=Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic\| author=William Kahan \|date=1 October 1987}}</ref>) <!-- (Commented out since the image is incorrect; it could be re-added once corrected.) The bits are laid out as follows: [[File:Octuple ~~persision~~precision visual demontration.png\|1000px\|Octuple precision visual demonstration]]▼ ▲The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 112 bits of the [[significand]] appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits: {{nowrap\|log<sub>10</sub>(2<sup>113</sup>) ≈ 34.016}}). The bits are laid out as follows: --> ▲[[File:Octuple persision visual demontration.png\|1000px\|Octuple precision visual demonstration]] === Exponent encoding === Line 25 ⟶ 26: The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard. * E<sub>min</sub> = ~~−262143~~−262142 * E<sub>max</sub> = 262143 * [[Exponent bias]] = ~~3FFF~~3FFFF<sub>16</sub> = ~~16383~~262143 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent. The stored exponents ~~0000~~00000<sub>16</sub> and ~~7FFF~~7FFFF<sub>16</sub> are interpreted specially. {\|class="wikitable" style="text-align:center" ! Exponent !! Significand zero !! Significand non-zero !! Equation \|- \| ~~0000~~00000<sub>16</sub> \|\| [[0 (number)\|0]], [[−0]] \|\| [[subnormal numbers]] \|\| <math>(-1)^{\text{signbit}} \times 2^{-~~16382~~262142} \times 0.\text{significandbits}_2</math> \|- \| ~~0001~~00001<sub>16</sub>, ..., ~~7FFE~~7FFFE<sub>16</sub> \|\|colspan=2\| normalized value \|\| <math>(-1)^{\text{signbit}} \times 2^{{\text{exponentbits}_2} - ~~16383~~262143} \times 1.\text{significandbits}_2</math> \|- \| ~~7FFF~~7FFFF<sub>16</sub> \|\| ±[[infinity\|∞]] \|\| [[NaN]] (quiet, signalling) \|} The minimum strictly positive (subnormal) value is {{nowrap\|2<sup>~~−16494~~−262378</sup> ≈ 10<sup>~~−4965~~−78984</sup>}} and has a precision of only one bit. The minimum positive normal value is 2<sup>~~−16382~~−262142</sup> ≈ 32.~~3621~~4824 × 10<sup>~~−4932~~−78913</sup> ~~and has a precision of 112 bits, i.e. ±2 <sup>−16494</sup> as well~~. The maximum representable value is 2<sup>~~16384~~262144</sup> − 2<sup>~~16272~~261907</sup> ≈ 1.~~1897~~6113 × 10<sup>~~4932~~78913</sup>. === Octuple-precision examples === Line 55 ⟶ 56: 8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = −0 7fff ~~0000~~f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = +infinity ffff ~~0000~~f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 = −infinity By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand.

Octuple-precision floating-point format: Difference between revisions