Revision as of 14:38, 10 July 2025 edit 208.114.63.4 (talk) →Quadruple precision examples Tag: Reverted ← Previous edit		Revision as of 16:55, 10 July 2025 edit undo Jonesey95 (talk \| contribs) Autopatrolled, Extended confirmed users, Page movers, Mass message senders, Template editors 410,552 edits Fix Linter errors. Tag: Reverted Next edit →
Line 50: These examples are given in bit ''representation'', in [[hexadecimal]], of the floating-point value. This includes the sign, (biased) exponent, and significand. <pre~~<includeonly />~~> 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−16382</sup> × 2<sup>−112</sup> = 2<sup>−16494</sup> ≈ 6.4751751194380251109244389582276465525 × 10<sup>−4966</sup> Line 56: </pre> <pre~~<includeonly />~~> 0000 ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−16382</sup> × (1 − 2<sup>−112</sup>) ≈ 3.3621031431120935062626778173217519551 × 10<sup>−4932</sup> Line 62: </pre> <pre~~<includeonly />~~> 0001 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−16382</sup> ≈ 3.3621031431120935062626778173217526026 × 10<sup>−4932</sup> Line 68: </pre> <pre~~<includeonly />~~> 7ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>16383</sup> × (2 − 2<sup>−112</sup>) ≈ 1.1897314953572317650857593266280070162 × 10<sup>4932</sup> Line 74: </pre> <pre~~<includeonly />~~> 3ffe ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−113</sup> ≈ 0.9999999999999999999999999999999999037 Line 80: </pre> <pre~~<includeonly />~~> 3fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one) </pre> <pre~~<includeonly />~~> 3fff 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 1 + 2<sup>−112</sup> ≈ 1.0000000000000000000000000000000001926 Line 90: </pre> <pre~~<includeonly />~~> 4000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2 c000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −2 </pre> <pre~~<includeonly />~~> 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 0 8000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −0 </pre> <pre~~<includeonly />~~> 7fff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = infinity ffff 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = −infinity </pre> <pre~~<includeonly />~~> 4000 921f b544 42d1 8469 898c c517 01b8<sub>16</sub> ≈ 3.1415926535897932384626433832795027975 (closest approximation to π) </pre> <pre~~<includeonly />~~> 3ffd 5555 5555 5555 5555 5555 5555 5555<sub>16</sub> ≈ 0.3333333333333333333333333333333333173 (closest approximation to 1/3) Line 142: For the [[C (programming language)\|C programming language]], ISO/IEC TS 18661-3 (floating-point extensions for C, interchange and extended types) specifies <code>_Float128</code> as the type implementing the IEEE 754 quadruple-precision format (binary128).<ref>{{cite web\|title=ISO/IEC TS 18661-3\|url=https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1945.pdf\|date=2015-06-10\|access-date=2019-09-22}}</ref> Alternatively, in [[C (programming language)\|C]]/[[C++]] with a few systems and compilers, quadruple precision may be specified by the [[long double]] type, but this is not required by the language (which only requires <code>long double</code> to be at least as precise as <code>double</code>), nor is it common. As of [[C++23]], the C++ language defines a <code><<stdfloat></code> header that contains fixed-width floating-point types. Implementations of these are optional, but if supported, <code>std::float128_t</code> corresponds to quadruple precision. On x86 and x86-64, the most common C/C++ compilers implement <code>long double</code> as either 80-bit [[extended precision]] (e.g. the [[GNU C Compiler]] gcc<ref>[https://web.archive.org/web/20080713131713/https://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html i386 and x86-64 Options (archived copy on web.archive.org)], ''Using the GNU Compiler Collection''.</ref> and the [[Intel C++ Compiler]] with a <code>/Qlong‑double</code> switch<ref>[http://software.intel.com/en-us/articles/size-of-long-integer-type-on-different-architecture-and-os/ Intel Developer Site].</ref>) or simply as being synonymous with double precision (e.g. [[Microsoft Visual C++]]<ref>[http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx MSDN homepage, about Visual C++ compiler].</ref>), rather than as quadruple precision. The procedure call standard for the [[ARM architecture#AArch64\|ARM 64-bit architecture]] (AArch64) specifies that <code>long double</code> corresponds to the IEEE 754 quadruple-precision format.<ref>{{cite web\|title=Procedure Call Standard for the ARM 64-bit Architecture (AArch64)\|url=http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf\|date=2013-05-22\|access-date=2019-09-22\|archive-url=https://web.archive.org/web/20191016000704/http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf\|archive-date=2019-10-16\|url-status=dead}}</ref> On a few other architectures, some C/C++ compilers implement <code>long double</code> as quadruple precision, e.g. gcc on [[PowerPC]] (as double-double<ref>[https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html RS/6000 and PowerPC Options], ''Using the GNU Compiler Collection''.</ref><ref>[https://developer.apple.com/legacy/mac/library/documentation/Performance/Conceptual/Mac_OSX_Numerics/Mac_OSX_Numerics.pdf Inside Macintosh – PowerPC Numerics]. {{webarchive\|url=https://web.archive.org/web/20121009191824/http://developer.apple.com/legacy/mac/library/documentation/Performance/Conceptual/Mac_OSX_Numerics/Mac_OSX_Numerics.pdf\|date=October 9, 2012}}.</ref><ref>[https://opensource.apple.com/source/gcc/gcc-5646/gcc/config/rs6000/darwin-ldouble.c 128-bit long double support routines for Darwin].</ref>) and [[SPARC]],<ref>[https://gcc.gnu.org/onlinedocs/gcc/SPARC-Options.html SPARC Options], ''Using the GNU Compiler Collection''.</ref> or the [[Sun Studio (software)\|Sun Studio compilers]] on SPARC.<ref>[http://docs.oracle.com/cd/E19422-01/819-3693/ncg_lib.html The Math Libraries], Sun Studio 11 ''Numerical Computation Guide'' (2005).</ref> Even if <code>long double</code> is not quadruple precision, however, some C/C++ compilers provide a nonstandard quadruple-precision type as an extension. For example, gcc provides a quadruple-precision type called <code>__float128</code> for x86, x86-64 and [[Itanium]] CPUs,<ref>[https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html Additional Floating Types], ''Using the GNU Compiler Collection''</ref> and on [[PowerPC]] as IEEE 128-bit floating-point using the -mfloat128-hardware or -mfloat128 options;<ref name=gcc6changes>{{cite web\|title=GCC 6 Release Series - Changes, New Features, and Fixes\|url=https://gcc.gnu.org/gcc-6/changes.html\|access-date=2016-09-13}}</ref> and some versions of Intel's C/C++ compiler for x86 and x86-64 supply a nonstandard quadruple-precision type called <code>_Quad</code>.<ref>[http://software.intel.com/en-us/forums/showthread.php?t=56359 Intel C++ Forums] (2007).</ref>

Quadruple-precision floating-point format: Difference between revisions