Revision as of 08:37, 1 August 2025 edit Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits →Double-double arithmetic: mention floating-point expansions ← Previous edit		Revision as of 08:45, 1 August 2025 edit undo Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits m →Double-double arithmetic: updated URLs Next edit →
Line 102: == Double-double arithmetic == A common software technique to implement nearly quadruple precision using ''pairs'' of [[double-precision]] values is sometimes called '''double-double arithmetic'''.<ref name=Hida>Yozo Hida, X. Li, and D. H. Bailey, [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5769 Quad-Double Arithmetic: Algorithms, Implementation, and Application], Lawrence Berkeley National Laboratory Technical Report LBNL-46996 (2000). Also Y. Hida et al., [~~http~~https://web.mit.edu/tabbott/Public/quaddouble-debian/qd-2.3.4-old/docs/qd.pdf Library for double-double and quad-double arithmetic] (2007).</ref><ref name="Shewchuk">J. R. Shewchuk, [https://www.cs.cmu.edu/~quake/robust.html Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates], [[Discrete & Computational Geometry]] 18: 305–363, 1997.</ref><ref name="Knuth-4.2.3-pr9">{{cite book \|last=Knuth \|first=D. E. \|title=The Art of Computer Programming \|edition=2nd \|at=chapter 4.2.3. problem 9. }}</ref> Using pairs of IEEE double-precision values with 53-bit significands, double-double arithmetic provides operations on numbers with significands of at least<ref name=Hida/> {{nowrap\|1=2 × 53 = 106 bits}} (actually 107 bits<ref>Robert Munafo. [~~http~~https://mrob.com/pub/math/f161.html F107 and F161 High-Precision Floating-Point Data Types] (2011).</ref> except for some of the largest values, due to the limited exponent range), only slightly less precise than the 113-bit significand of IEEE binary128 quadruple precision. The range of a double-double remains essentially the same as the double-precision format because the exponent has still 11 bits,<ref name=Hida /> significantly lower than the 15-bit exponent of IEEE quadruple precision (a range of {{nowrap\|1.8 × 10<sup>308</sup>}} for double-double versus {{nowrap\|1.2 × 10<sup>4932</sup>}} for binary128). In particular, a double-double/quadruple-precision value ''q'' in the double-double technique is represented implicitly as a sum {{nowrap\|1=''q'' = ''x'' + ''y''}} of two double-precision values ''x'' and ''y'', each of which supplies half of ''q''<nowiki/>'s significand.<ref name=Shewchuk/> That is, the pair {{nowrap\|(''x'', ''y'')}} is stored in place of ''q'', and operations on ''q'' values {{nowrap\|(+, −, ×, ...)}} are transformed into equivalent (but more complicated) operations on the ''x'' and ''y'' values. Thus, arithmetic in this technique reduces to a sequence of double-precision operations; since double-precision arithmetic is commonly implemented in hardware, double-double arithmetic is typically substantially faster than more general [[arbitrary-precision arithmetic]] techniques.<ref name=Hida/><ref name=Shewchuk/> Line 114: In addition to the double-double arithmetic, it is also possible to generate triple-double or quad-double arithmetic if higher precision is required without any higher precision floating-point library. They are represented as a sum of three (or four) double-precision values respectively. They can represent operations with at least 159/161 and 212/215 bits respectively. A natural extension to an arbitrary number of terms (though limited by the exponent range) is called ''floating-point expansions''. A similar technique can be used to produce a '''double-quad arithmetic''', which is represented as a sum of two quadruple-precision values. They can represent operations with at least 226 (or 227) bits.<ref>sourceware.org [~~http~~https://sourceware.org/legacy-ml/libc-alpha/2012-03/msg01024.html Re: The state of glibc libm]</ref> == Implementations ==

Quadruple-precision floating-point format: Difference between revisions