Quadruple-precision floating-point format: Difference between revisions

Content deleted Content added
Computer-language support: simplification ("it" should have referred to what came before, not to what follows).
Space out to improve readability of markup, using pre tags makes it easier to identify preformatted text instead of spaces
Tag: Reverted
Line 50:
These examples are given in bit ''representation'', in [[hexadecimal]], of the floating-point value. This includes the sign, (biased) exponent, and significand.
 
0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−16382</sup> × 2<sup>−112</sup> = 2<sup>−16494</sup>
≈ 6.4751751194380251109244389582276465525 × 10<sup>−4966</sup>
(smallest positive subnormal number)
 
<pre>
0000 ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−16382</sup> × (1 − 2<sup>−112</sup>)
0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−16382</sup> × 2<sup>−112</sup> = 2<sup>−16494</sup>
≈ 3.3621031431120935062626778173217519551 × 10<sup>−4932</sup>
6.4751751194380251109244389582276465525 × (largest subnormal number)10<sup>−4966</sup>
(smallest positive subnormal number)
</pre>
 
<pre>
0001 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−16382</sup>
0000 ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−16382</sup> × (1 − 2<sup>−112</sup>)
≈ 3.3621031431120935062626778173217526026 × 10<sup>−4932</sup>
3.3621031431120935062626778173217519551 × (smallest positive normal number)10<sup>−4932</sup>
(smallestlargest subnormal number larger than one)
</pre>
 
<pre>
7ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>16383</sup> × (2 − 2<sup>−112</sup>)
0001 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−16382</sup>
≈ 1.1897314953572317650857593266280070162 × 10<sup>4932</sup>
3.3621031431120935062626778173217526026 × (largest normal number)10<sup>−4932</sup>
(closestsmallest approximationpositive tonormal πnumber)
</pre>
 
<pre>
3ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−113</sup>
7ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>16383</sup> × (2 − 2<sup>−112</sup>)
≈ 0.9999999999999999999999999999999999037
1.1897314953572317650857593266280070162 × (largest number less than one)10<sup>4932</sup>
3.3621031431120935062626778173217526026(largest ×normal 10<sup>−4932</sup>number)
</pre>
 
<pre>
3fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one)
3ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−113</sup>
60.4751751194380251109244389582276465525 × 10<sup>−4966</sup>9999999999999999999999999999999999037
1.1897314953572317650857593266280070162(largest ×number 10<sup>4932</sup>less than one)
</pre>
 
<pre>
3fff 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 1 + 2<sup>−112</sup>
3fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one)
≈ 1.0000000000000000000000000000000001926
</pre>
(smallest number larger than one)
 
<pre>
4000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2
c0003fff 0000 0000 0000 0000 0000 0000 00000001<sub>16</sub> = −21 + 2<sup>−112</sup>
31.3621031431120935062626778173217519551 × 10<sup>−4932</sup>0000000000000000000000000000000001926
(closestsmallest number approximationlarger tothan 1/3one)
</pre>
 
<pre>
0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 0
80004000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −02
3fffc000 0000 0000 0000 0000 0000 0000 00010000<sub>16</sub> = 1 + 2<sup>−112</sup>−2
</pre>
 
<pre>
7fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = infinity
ffff0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −infinity0
40008000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2−0
</pre>
 
<pre>
4000 921f b544 42d1 8469 898c c517 01b8<sub>16</sub> ≈ 3.1415926535897932384626433832795027975
00007fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 0infinity
(closest approximation to π)
7fffffff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = infinity−infinity
</pre>
 
<pre>
3ffd 5555 5555 5555 5555 5555 5555 5555<sub>16</sub> ≈ 0.3333333333333333333333333333333333173
4000 921f b544 42d1 8469 898c c517 01b8<sub>16</sub> ≈ 3.1415926535897932384626433832795027975
(closest approximation to 1/3)
0.9999999999999999999999999999999999037(closest approximation to π)
</pre>
 
<pre>
3ffd 5555 5555 5555 5555 5555 5555 5555<sub>16</sub> ≈ 0.3333333333333333333333333333333333173
(closest approximation to 1.0000000000000000000000000000000001926/3)
</pre>
 
By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand. Thus, the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]].
Line 108 ⟶ 133:
A similar technique can be used to produce a '''double-quad arithmetic''', which is represented as a sum of two quadruple-precision values. They can represent operations with at least 226 (or 227) bits.<ref>sourceware.org [http://sourceware.org/ml/libc-alpha/2012-03/msg01024.html Re: The state of glibc libm]</ref>
 
== Implementations ==
Quadruple precision is often implemented in software by a variety of techniques (such as the double-double technique above, although that technique does not implement IEEE quadruple precision), since direct hardware support for quadruple precision is, {{as of|2016|lc=on}}, less common (see "[[#Hardware support|Hardware support]]" below). One can use general [[arbitrary-precision arithmetic]] libraries to obtain quadruple (or higher) precision, but specialized quadruple-precision implementations may achieve higher performance.
 
=== Computer-language support ===
A separate question is the extent to which quadruple-precision types are directly incorporated into computer [[programming language]]s.
 
Line 128 ⟶ 153:
As of 2024, [[Rust (programming language)|Rust]] is currently working on adding a new <code>f128</code> type for IEEE quadruple-precision 128-bit floats.<ref>{{cite web |last1=Cross |first1=Travis |title=Tracking Issue for f16 and f128 float types |url=https://github.com/rust-lang/rust/issues/116909 |website=GitHub |access-date=2024-07-05}}</ref>
 
=== Libraries and toolboxes ===
* The [[GNU Compiler Collection|GCC]] quad-precision math library, [https://gcc.gnu.org/onlinedocs/libquadmath libquadmath], provides <code>__float128</code> and <code>__complex128</code> operations.
* The [[Boost (C++ libraries)|Boost]] multiprecision library Boost.Multiprecision provides unified cross-platform C++ interface for <code>__float128</code> and <code>_Quad</code> types, and includes a custom implementation of the standard math library.<ref>{{cite web |title=Boost.Multiprecision – float128 |url=http://www.boost.org/doc/libs/1_58_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/float128.html |access-date=2015-06-22}}</ref>