Octuple-precision floating-point format: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:14, 16 May 2015 edit Vincent Lefèvre (talk \| contribs) Extended confirmed users 5,215 edits →See also: Extended precision: more general than 80-bit. Removed "long double" since there are no cross references between the articles. ← Previous edit		Latest revision as of 12:51, 3 August 2025 edit undo C.Fred (talk \| contribs) Autopatrolled, Administrators 282,224 edits m Reverted edit by 2A01:CB16:201A:C683:0:67:3AB1:9A01 (talk) to last version by Vincent Lefèvre Tag: Rollback
(93 intermediate revisions by 46 users not shown)
Line 1: {{short description\|256-bit computer number format}} ~~{{Multiple issues\|~~ {{~~Refimprove~~use dmy dates\|date=~~May~~December ~~2015~~2022\|cs1-dates=y}} {{~~Cleanup~~use list-defined references\|date=~~April~~December ~~2009~~2022}} }}▼ In [[computing]], '''octuple precision''' is a binary [[floating-point]]-based [[computer number format]] that occupies [[32]] [[byte]]s ([[256]] [[bit]]s or [[64]] [[nibble]]s) in computer memory. This [[256]]-[[bit]] octuple precision is for applications requiring results in higher than [[quadruple precision]]. This format is rarely (if ever) used and very few things support it.▼ {{Floating-point}} {{Computer architecture bit widths}} ▲In [[computing]], '''octuple precision''' is a binary [[floating-point]]-based [[computer number format]] that occupies [[32]] [[byte]]s ([[256]] [[bit~~]]s or [[64]] [[nibble~~]]s) in computer memory. This [[256]]-[[bit]] octuple precision is for applications requiring results in higher than [[quadruple precision]]. ~~This format is rarely (if ever) used and very few things support it.~~ The range greatly exceeds what is needed to describe all known physical limitations within the observable universe or precisions better than [[Planck units]]. == IEEE 754 octuple-precision binary floating-point format: binary256 ==▼ ▲== IEEE 754 octuple-precision binary floating-point format: binary256 == The IEEE 754 standard specifies a '''binary256''' as having:{{dubious\|reason=No references found. Some descriptions (e.g. [http://www.researchgate.net/publication/224087006_A_system_on_the_Web_for_octuple-precision_computation]) mention octuple precision with other sizes, such as a 15-bit exponent and 240-bit explicit mantissa\|date=May 2015}} In its 2008 revision, the [[IEEE 754]] standard specifies a '''binary256''' format among the ''interchange formats'' (it is not a basic format), as having: * [[Sign bit]]: 1 bit * [[Exponent]] width: 1819 bits * [[Significand]] [[precision (arithmetic)\|precision]]: ~~238~~237 bits (~~237~~236 explicitly stored) <!-- "significand", with a d at the end, is a technical term, please do not confuse with "significant" --> The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only ~~112~~236 bits of the [[significand]] appear in the memory format, but the total precision is ~~113~~237 bits (approximately 3471 decimal digits: {{nowrap\|log<sub>10</sub>(2<sup>~~113~~237</sup>) ≈ 3471.~~016~~344}}). ~~The bits are laid out as follows:~~▼ This gives from 33 to 36 significant decimal digits' precision. (If a decimal string with at most 33 significant decimal digits is converted to IEEE 754 octuple precision and then converted back to the same number of significant decimal digits, then the final string should match the original; and if an IEEE 754 octuple precision is converted to a decimal string with at least 36 significant decimal and then converted back to octuple precision, then the final number must match the original.<ref name=whyieee>{{cite web\|url=http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF\|title=Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic\| author=William Kahan \|date=1 October 1987}}</ref>) <!-- (Commented out since the image is incorrect; it could be re-added once corrected.)--> The bits are laid out as follows: ▲The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 112 bits of the [[significand]] appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits: {{nowrap\|log<sub>10</sub>(2<sup>113</sup>) ≈ 34.016}}). The bits are laid out as follows: [[File:Octuple ~~persision~~precision visual ~~demontration~~demonstration.~~png~~svg\|1000px\|~~Octuple~~Layout of octuple-precision ~~visual~~floating-point ~~demonstration~~format]] === Exponent encoding === The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE  754 standard.▼ * E<sub>min</sub> = ~~−262143~~−262142▼ ▲The octuple-precision binary floating-point exponent is encoded using an [[offset binary]] representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard. ▲* E<sub>min</sub> = −262143 * E<sub>max</sub> = 262143 * [[Exponent bias]] = ~~3FFF~~3FFFF<sub>16</sub> = ~~16383~~262143 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of ~~16383~~262143 has to be subtracted from the stored exponent. The stored exponents ~~0000~~00000<sub>16</sub> and ~~7FFF~~7FFFF<sub>16</sub> are interpreted specially. {\| class="wikitable" style="text-align: center;" \|- ! Exponent !! Significand zero !! Significand non-zero !! Equation \|- \| ~~0000~~00000<sub>16</sub> \|\| [[0 (number)\|0]], [[−0]] \|\| [[subnormal numbers]] \|\| ~~<math>~~(-1−1)~~^{\text{~~<sup>signbit}}</sup> ~~\times~~× 2~~^{-16382}~~<sup>−262142</sup> ~~\times~~× 0.~~\text{~~significandbits~~}_2~~<sub>2</~~math~~sub> \|- \| ~~0001~~00001<sub>16</sub>, ..., ~~7FFE~~7FFFE<sub>16</sub> \|\|colspan=2\| normalized value \|\| ~~<math>~~(-1−1)~~^{\text{~~<sup>signbit}}</sup> ~~\times~~× 2~~^{{\text{exponentbits}_2}~~<sup>exponent -bits<sub>2</sub></sup> ~~16383} \times~~× 1.~~\text{~~significandbits~~}_2~~<sub>2</~~math~~sub> \|- \| ~~7FFF~~7FFFF<sub>16</sub> \|\| ±[[infinity\|∞]] \|\| [[NaN]] (quiet, ~~signalling~~signaling) \|} The minimum strictly positive (subnormal) value is {{nowrap\|2<sup>~~−16494~~−262378</sup> ≈ 10<sup>~~−4965~~−78984</sup>}} and has a precision of only one bit. The minimum positive normal value is 2<sup>~~−16382~~−262142</sup> ≈ 32.~~3621~~4824 × 10<sup>~~−4932~~−78913</sup> ~~and has a precision of 112 bits, i.e. ±2 <sup>−16494</sup> as well~~. The maximum representable value is 2<sup>~~16384~~262144</sup> − 2<sup>~~16272~~261907</sup> ≈ 1.~~1897~~6113 × 10<sup>~~4932~~78913</sup>. === Octuple-precision examples === These examples are given in bit ''representation'', in [[hexadecimal]], of the floating-point value. This includes the sign, (biased) exponent, and significand. 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = +0 8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = −0 7fff ~~0000~~f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = +infinity ffff ~~0000~~f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <sub>16</sub> = −infinity 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 2<sup>−262142</sup> × 2<sup>−236</sup> = 2<sup>−262378</sup> ≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10<sup>−78984</sup> (smallest positive subnormal number) 0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>−262142</sup> × (1 − 2<sup>−236</sup>) ≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10<sup>−78913</sup> (largest subnormal number) 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 2<sup>−262142</sup> ≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10<sup>−78913</sup> (smallest positive normal number) 7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 2<sup>262143</sup> × (2 − 2<sup>−236</sup>) ≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 10<sup>78913</sup> (largest normal number) 3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff<sub>16</sub> = 1 − 2<sup>−237</sup> ≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472 (largest number less than one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000<sub>16</sub> = 1 (one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001<sub>16</sub> = 1 + 2<sup>−236</sup> ≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906 (smallest number larger than one) By default, 1/3 rounds down like [[double precision]], because of the odd number of bits in the significand. So the bits beyond the rounding point are <code>0101...</code> which is less than 1/2 of a [[unit in the last place]]. == Implementations == Octuple precision is rarely ~~if ever~~ implemented ~~in to software~~ since usage of it is extremely rare. [[Apple Inc.]] had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit [[two's complement]] significand and a 32-bit exponent.<ref name="Crandall-Papadopoulos_2002"/> One can use general [[arbitrary-precision arithmetic]] libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance. ~~===Computer-language support===~~ ~~In C++, It is possible to make a library to handle octuple-precision floating-point arithmetic. In theory it is possible to do octuple-precision floating-point arithmetic in binary.~~ === Hardware support === There is ~~little to~~ no known hardware with native support for octuple~~-precision~~ ~~arithmetic~~precision. == See also == * [[IEEE 754]] * [[IEEE 754-2008\|IEEE Standard for Floating-Point Arithmetic (IEEE 754)]] * [[Extended precision]] * [[ISO/IEC 10967]], Language-independent arithmetic * [[Primitive data type]] * [[Scientific notation]] * [[Half-precision floating-point format]] * [[Single-precision floating-point format]] * [[Double-precision floating-point format]] * [[Quadruple-precision floating-point format]] == References == {{reflist}}\|refs= <ref name="Crandall-Papadopoulos_2002">{{cite web \|title=Octuple-precision floating point on Apple G4 (archived copy on web.archive.org) \|author-first1=Richard E. \|author-last1=Crandall \|author-link1=Richard E. Crandall \|author-first2=Jason S. \|author-last2=Papadopoulos \|date=2002-05-08 \|url=http://images.apple.com/ca/acg/pdf/oct3a.pdf \|url-status=unfit \|archive-url=https://web.archive.org/web/20060728140052/http://images.apple.com/ca/acg/pdf/oct3a.pdf \|archive-date=2006-07-28}} (8 pages)</ref> ▲}} == Further reading == * {{cite book \|author-first=Nelson H. F. \|author-last=Beebe \|title=The Mathematical-Function Computation Handbook - Programming Using the MathCW Portable Software Library \|date=2017-08-22 \|___location=Salt Lake City, UT, USA \|publisher=[[Springer International Publishing AG]] \|edition=1 \|lccn=2017947446 \|isbn=978-3-319-64109-6 \|doi=10.1007/978-3-319-64110-2 }} {{data types}} <!--This page is a slightly modified copy of the https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format page. I do not believe this is plagiarizing because it comes from Wikipedia, but if it is then please delete this page since it would be nearly impossible to fix this page. :)--> [[Category:Binary arithmetic]] [[Category:~~Data~~Floating point types]]