Revision as of 11:07, 3 July 2025 edit Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits →Computer-language support: simplification ("it" should have referred to what came before, not to what follows). ← Previous edit		Revision as of 14:35, 10 July 2025 edit undo 208.114.63.4 (talk) Space out to improve readability of markup, using pre tags makes it easier to identify preformatted text instead of spaces Tag: Reverted Next edit →
Line 50: These examples are given in bit ''representation'', in [[hexadecimal]], of the floating-point value. This includes the sign, (biased) exponent, and significand. 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−16382</sup> × 2<sup>−112</sup> = 2<sup>−16494</sup>▼ ≈ 6.4751751194380251109244389582276465525 × 10<sup>−4966</sup>▼ (smallest positive subnormal number)▼ <pre> 0000 ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−16382</sup> × (1 − 2<sup>−112</sup>)▼ ▲ 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−16382</sup> × 2<sup>−112</sup> = 2<sup>−16494</sup> ≈ 3.3621031431120935062626778173217519551 × 10<sup>−4932</sup>▼ ≈ 6.4751751194380251109244389582276465525 × ~~(largest subnormal number)~~10<sup>−4966</sup> ▲ (smallest positive subnormal number) </pre> <pre> 0001 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−16382</sup>▼ ▲ 0000 ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−16382</sup> × (1 − 2<sup>−112</sup>) ≈ 3.3621031431120935062626778173217526026 × 10<sup>−4932</sup>▼ ≈ 3.3621031431120935062626778173217519551 × ~~(smallest positive normal number)~~10<sup>−4932</sup> (~~smallest~~largest subnormal number ~~larger than one~~)▼ </pre> <pre> 7ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>16383</sup> × (2 − 2<sup>−112</sup>)▼ ▲ 0001 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−16382</sup> ≈ 1.1897314953572317650857593266280070162 × 10<sup>4932</sup>▼ ≈ 3.3621031431120935062626778173217526026 × ~~(largest normal number)~~10<sup>−4932</sup> (~~closest~~smallest ~~approximation~~positive tonormal πnumber)▼ </pre> <pre> 3ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−113</sup>▼ ▲ 7ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>16383</sup> × (2 − 2<sup>−112</sup>) ~~≈ 0.9999999999999999999999999999999999037~~ ≈ 1.1897314953572317650857593266280070162 × ~~(largest number less than one)~~10<sup>4932</sup> ▲ ≈ ~~3.3621031431120935062626778173217526026~~(largest ×normal ~~10<sup>−4932</sup>~~number) </pre> <pre> 3fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one)▼ ▲ 3ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−113</sup> ▲ ≈ 60.~~4751751194380251109244389582276465525 × 10<sup>−4966</sup>~~9999999999999999999999999999999999037 ▲ ≈ ~~1.1897314953572317650857593266280070162~~(largest ×number ~~10<sup>4932</sup>~~less than one) </pre> <pre> 3fff 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 1 + 2<sup>−112</sup>▼ ▲ 3fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one) ~~≈ 1.0000000000000000000000000000000001926~~ </pre> ▲ (smallest number larger than one) <pre> 4000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2▼ ~~c000~~3fff 0000 0000 0000 0000 0000 0000 ~~0000~~0001<sub>16</sub> = −21 + 2<sup>−112</sup> ▲ ≈ 31.~~3621031431120935062626778173217519551 × 10<sup>−4932</sup>~~0000000000000000000000000000000001926 (~~closest~~smallest number ~~approximation~~larger tothan ~~1/3~~one)▼ </pre> <pre> 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 0▼ ~~8000~~4000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −02 ▲ ~~3fff~~c000 0000 0000 0000 0000 0000 0000 ~~0001~~0000<sub>16</sub> = ~~1 + 2<sup>−112</sup>~~−2 </pre> <pre> 7fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = infinity▼ ~~ffff~~0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = ~~−infinity~~0 ▲ ~~4000~~8000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2−0 </pre> <pre> 4000 921f b544 42d1 8469 898c c517 01b8<sub>16</sub> ≈ 3.1415926535897932384626433832795027975▼ ▲ ~~0000~~7fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 0infinity ▲ (closest approximation to π) ▲ ~~7fff~~ffff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = ~~infinity~~−infinity </pre> <pre> 3ffd 5555 5555 5555 5555 5555 5555 5555<sub>16</sub> ≈ 0.3333333333333333333333333333333333173▼ ▲ 4000 921f b544 42d1 8469 898c c517 01b8<sub>16</sub> ≈ 3.1415926535897932384626433832795027975 ▲ (closest approximation to 1/3) ▲ ≈ ~~0.9999999999999999999999999999999999037~~(closest approximation to π) </pre> <pre> ▲ 3ffd 5555 5555 5555 5555 5555 5555 5555<sub>16</sub> ≈ 0.3333333333333333333333333333333333173 ▲ ≈ (closest approximation to 1~~.0000000000000000000000000000000001926~~/3) </pre> By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand. Thus, the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]]. Line 108 ⟶ 133: A similar technique can be used to produce a '''double-quad arithmetic''', which is represented as a sum of two quadruple-precision values. They can represent operations with at least 226 (or 227) bits.<ref>sourceware.org [http://sourceware.org/ml/libc-alpha/2012-03/msg01024.html Re: The state of glibc libm]</ref> == Implementations == Quadruple precision is often implemented in software by a variety of techniques (such as the double-double technique above, although that technique does not implement IEEE quadruple precision), since direct hardware support for quadruple precision is, {{as of\|2016\|lc=on}}, less common (see "[[#Hardware support\|Hardware support]]" below). One can use general [[arbitrary-precision arithmetic]] libraries to obtain quadruple (or higher) precision, but specialized quadruple-precision implementations may achieve higher performance. === Computer-language support === A separate question is the extent to which quadruple-precision types are directly incorporated into computer [[programming language]]s. Line 128 ⟶ 153: As of 2024, [[Rust (programming language)\|Rust]] is currently working on adding a new <code>f128</code> type for IEEE quadruple-precision 128-bit floats.<ref>{{cite web \|last1=Cross \|first1=Travis \|title=Tracking Issue for f16 and f128 float types \|url=https://github.com/rust-lang/rust/issues/116909 \|website=GitHub \|access-date=2024-07-05}}</ref> === Libraries and toolboxes === * The [[GNU Compiler Collection\|GCC]] quad-precision math library, [https://gcc.gnu.org/onlinedocs/libquadmath libquadmath], provides <code>__float128</code> and <code>__complex128</code> operations. * The [[Boost (C++ libraries)\|Boost]] multiprecision library Boost.Multiprecision provides unified cross-platform C++ interface for <code>__float128</code> and <code>_Quad</code> types, and includes a custom implementation of the standard math library.<ref>{{cite web \|title=Boost.Multiprecision – float128 \|url=http://www.boost.org/doc/libs/1_58_0/libs/multiprecision/doc/html/boost_multiprecision/tut/floats/float128.html \|access-date=2015-06-22}}</ref>

Quadruple-precision floating-point format: Difference between revisions