Decimal64 floating-point format: Difference between revisions

Content deleted Content added
added 'Purpose and use', most important Information for new recipients, slightly pimped explanation of tables - WIP - there is lot of redundant info, I'll try to streamline next weeks.
Citation bot (talk | contribs)
Removed URL that duplicated identifier. Removed access-date with no URL. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 629/1032
 
(18 intermediate revisions by 12 users not shown)
Line 1:
{{Short description|64-bit computer number format}}
{{lowercase title}}
{{Use dmy dates|date=July 2020|cs1-dates=y}}
{{floating-point}}
In [[computing]], '''decimal64''' is a [[decimal floating point|decimal floating-point]] [[computer number format]] that occupies 8 bytes (64 bits) in computer memory.
 
decimal64Decimal64 wasis a decimal floating-point format, formally introduced in the [[IEEE 754-2008 revision|2008 revision]]<ref name="IEEE-754_2008">{{cite book |author=IEEE Computer Society |url=https://ieeexplore.ieee.org/document/4610935 |title=IEEE Standard for Floating-Point Arithmetic |author=IEEE Computer Society |date=2008-08-29 |publisher=[[IEEE]] |isbnid=978IEEE Std 754-0-7381-5753-52008 |doi=10.1109/IEEESTD.2008.4610935 |id=IEEE Std 754-2008 |ref=CITEREFIEEE_7542008 |access-dateisbn=2016978-020-087381-5753-5 }}</ref> of the [[IEEE 754]] standard, whichalso wasknown taken over into theas ISO/IEC/IEEE 60559:2011.<ref name="ISO-60559_2011">{{Cite book |last=ISO/IEC JTC 1/SC 25 |url=https://www.iso.org/standard/57469.html |title=ISO/IEC/IEEE 60559:2011 — Information technology — Microprocessor Systems — Floating-Point arithmetic |dateurl=June 2011https://www.iso.org/standard/57469.html |publisher=ISO |pages=1–58 |date=June 2011}}</ref> standard.
== Purpose and use ==
decimal64 fits well to replace binary64 format in applications where 'small deviations' are unwanted and speed isn't extremely crucial.
 
In contrast to the '''binary'''xxx data formats the '''decimal'''xxx formats provide exact representation of decimal fractions, exact calculations with them and enable human common 'ties away from zero' rounding (in some range, to some precision, to some degree). In a trade-off for reduced performance. They are intended for applications where it's requested to come near to schoolhouse math, such as financial and tax computations. (In short they avoid plenty of problems like 0.2 + 0.1 -> 0.30000000000000004 which happen with binary64 datatypes.)
 
== Format ==
Decimal64 supports 'normal' values that can have 16 digit precision from {{gaps|±1.000|000|000|000|000|e=-383}} to {{gaps|±9.999|999|999|999|999|e=384}}, plus 'denormal' values with ramp-down relative precision down to ±1 × 10.×10<sup>−398</sup>, [[signed zero]]s, signed infinities and [[NaN]] (Not a Number). This format supports two different encodings.
 
The binary format of the same size supports a range from denormal-min {{gaps|±5|||||e=-324|}}, over normal-min with full 53-bit precision {{gaps|±2.225|073|858|507|201|e=-308|4}} to max {{gaps|±1.797|693|134|862|315|e=+308|7}}.
 
Because the significand for the [[IEEE 754]] decimal formats is not normalized, most values with less than 16 [[significant digits]] have multiple possible representations; 1000000 × 10<sup>-2−2</sup>=100000 × 10<sup>-1−1</sup>=10000 × 10<sup>0</sup>=1000 × 10<sup>1</sup> all have the value 10000. These sets of representations for a same value are called ''[[Cohort (floating point)|cohorts]]'', the different members can be used to denote how many digits of the value are known precisely. Each signed zero has 768 possible representations (1536 for all zeros, in two different cohorts).
 
== Representation / encodingEncoding of decimal64 values ==
decimal64 values are represented in a 'not normalized' near to 'scientific format', with combining some bits of the exponent with the leading bits of the significand in a 'combination field'.
 
{| class="wikitable"
|-
! Sign !! Combination !! Significand continuation
! <u>S</u>ign !! Co<u>m</u>bination !! <u>T</u>railing significand bits
|-
! 1 bit !! 13 bits !! 50 bits
|-
| {{mono|s}} || {{mono|mmmmmmmmmmmmm}} || tttttttttttttttttttttttttttttttttttttttttttttttttt{{mono|cccccccccccccccccccccccccccccccccccccccccccccccccc}}
|}
 
Line 38 ⟶ 36:
In the cases of Infinity and NaN, all other bits of the encoding are ignored. Thus, it is possible to initialize an array to Infinities or NaNs by filling it with a single byte value.
 
=== BIDBinary encodinginteger significand field ===
This format uses a binary significand from 0 to {{math|size=100%|1=10<sup>16</sup> − 1 = {{gaps|9|999|999|999|999|999}} = 2386F26FC0FFFF<sub>16</sub> = {{gaps|1000|1110000110|1111001001|1011111100|0000111111|1111111111<sub>2</sub>}}.}}The encoding, completely stored on 64 bits, can represent binary significands up to {{math|size=100%|1=10 × 2<sup>50</sup> − 1 = {{gaps|11|258|999|068|426|239}} = 27FFFFFFFFFFFF<sub>16</sub>,}} but values larger than {{math|size=100%|1=10<sup>16</sup> − 1}} are illegal (and the standard requires implementations to treat them as 0, if encountered on input).
 
Line 46 ⟶ 44:
 
If the {{val|2|u=bits}} after the sign bit are "11", then the 10-bit exponent field is shifted {{val|2|u=bits}} to the right (after both the sign bit and the "11" bits thereafter), and the represented significand is in the remaining {{val|51|u=bits}}. In this case there is an implicit (that is, not stored) leading 3-bit sequence "100" for the MSB bits of the true significand (in the remaining lower bits ''ttt...ttt'' of the significand, not all possible values are used).
 
Be aware that the bit numbering used in the tables for e.g. m<sub>12</sub> … m<sub>0</sub>  is in opposite direction than that used in the paper for the IEEE 754 standard G<sub>0</sub> … G<sub>12</sub>.
 
{| class="wikitable" style="text-align:left; border-width:0;"
Line 57 ⟶ 53:
! rowspan="2" |Significand / Description
|-
! g12 !! g11 !! g10 !! g9 !! g8 !! g7 !! g6 !! g5 !! g4 !! g3 !! g2
! m<sub>12</sub>!! m<sub>11</sub>!! m<sub>10</sub>!! m<sub>9</sub>!! m<sub>8</sub>!! m<sub>7</sub>!! m<sub>6</sub>!! m<sub>5</sub>!! m<sub>4</sub>!! m<sub>3</sub>!! m<sub>2</sub>
!g1
!m<sub>1</sub>
!g0
!m<sub>0</sub>
|-
| colspan="16" |combination field not! starting with '11', bits ab = 00, 01 or 10
|-
| style="font-family:monospace; background:#cedff2;" | '''a''' || style="font-family:monospace; background:#cedff2;" | '''b''' || style="font-family:monospace; background:#cedff2;" | '''c''' || style="font-family:monospace; background:#cedff2;" | '''d''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cef2e0;" | '''e''' || style="font-family:monospace; background:#cef2e0;" |'''f''' || style="font-family:monospace; background:#cef2e0;" |'''g'''
| || style="font-family:monospace; background:#cedff2;" | '''abcdmmmmmm''' || style="background:#cef2e0;" | {{mono|(0)'''efgtttttttttttttttttttttttttttttttttttttttttttttttttt''' }}
Finite number with 'small' significand,first beingdigit <of 9007199254740992, fits into 53significand bits(0&nbsp;..&nbsp;7).
|-
| colspan="16" |combination field starting with '11', but not 1111, bits ab = 11, bits cd = 00, 01 or 10
|-
| 1 || 1 || style="font-family:monospace; background:#cedff2;" | '''c'''|| style="font-family:monospace; background:#cedff2;" | '''d''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''e''' || style="font-family:monospace; background:#cedff2;" | '''f''' || style="font-family:monospace; background:#cef2e0;" | '''g'''
| || style="font-family:monospace; background:#cedff2;" | '''cdmmmmmmef''' || style="background:#cef2e0;" | {{mono|'''100gtttttttttttttttttttttttttttttttttttttttttttttttttt''' }}
Finite number with 'big' significand,first beingdigit >of 9007199254740991,significand needs(8 54or bits9).
|-
| colspan="16" |combination field starting with '1111', bits abcd = 1111
Line 93 ⟶ 89:
|signaling NaN (with payload in significand)
|}
In contrast to DPD format below theThe leading bits of the significand field do ''not'' encode the most significant decimal digit; they are simply part of a larger pure-binary number. For example, combineda significand of {{gaps|8|000|000|000|000|000}} is encoded as binary {{gaps|0111|0001101011|1111010100|1001100011|0100000000|0000000000}}<sub>2</sub>, with the implicitleading prefix{{val|4|u=bits}} ofencoding 7; the first significand which requires a 54th bit is {{math|size=100%|1=2<sup>53</sup> for= big{{gaps|9|007|199|254|740|992}}.}} significands,The simplyhighest partvalid ofsignificant ais larger{{gaps|9|999|999|999|999|999}} whose pure-binary number.encoding is
{{gaps|(100)0|1110000110|1111001001|1011111100|0000111111|1111111111}}<sub>2</sub> (with the 3 most significant bits (100) not stored but implicit as shown above; and the next bit is always zero in valid encodings).
 
The resulting 'raw' exponent is a 10 bit binary integer where the leading bits are not '11', thus values 0 ... 1011111111<sub>b</sub> = 0 ... 767<sub>d</sub>, appr. bias is to be subtracted. The resulting significand could be a positive binary integer of 54 bits up to 1001 1111111111 1111111111 1111111111 1111111111 1111111111<sub>b</sub> = 11258999068426239<sub>d</sub>, but values above 10<sup>16</sup> − 1 = 9999999999999999 = 2386F26FC0FFFF<sub>16</sub> = 100011100001101111001001101111110000001111111111111111<sub>2</sub> are 'illegal' and have to be treated as zeroes. To obtain the individual decimal digits the significand has to be divided by 10 repeatedly.
 
In the above cases, the value represented is
Line 101 ⟶ 96:
: {{math|1=(−1)<sup>sign</sup> × 10<sup>exponent−398</sup> × significand}} <!-- Remember, significand is defined as an integer: 0 <= significand < 10^16 -->
 
If the four bits after the sign bit are "1111" then the value is an infinity or a NaN, as described above:
=== DPD encoding ===
 
0 11110 xx...x +infinity
1 11110 xx...x -infinity
x 11111 0x...x a quiet NaN
x 11111 1x...x a signalling NaN
 
=== Densely packed decimal significand field ===
In this version, the significand is stored as a series of decimal digits. The leading digit is between 0 and 9 (3 or 4 binary bits), and the rest of the significand uses the [[densely packed decimal]] (DPD) encoding.
 
Line 112 ⟶ 114:
If the first two bits after the sign bit are "00", "01", or "10", then those are the leading bits of the exponent, and the three bits "cde" after that are interpreted as the leading decimal digit (0 to 7):
 
If the first two bits after the sign bit are "11", then the second 2-bits are the leading bits of the exponent, and the next bit "e" is prefixed with implicit bits "100" to form the leading decimal digit of the significand (8 or 9):
 
The remaining two combinations (11 110 and 11 111) of the 5-bit field after the sign bit are used to represent ±infinity and NaNs, respectively.
Line 124 ⟶ 126:
! rowspan="2" |Significand / Description
|-
! g12 !! g11 !! g10 !! g9 !! g8 !! g7 !! g6 !! g5 !! g4 !! g3 !! g2
! m<sub>12</sub>!! m<sub>11</sub>!! m<sub>10</sub>!! m<sub>9</sub>!! m<sub>8</sub>!! m<sub>7</sub>!! m<sub>6</sub>!! m<sub>5</sub>!! m<sub>4</sub>!! m<sub>3</sub>!! m<sub>2</sub>
!g1
!m<sub>1</sub>
!g0
!m<sub>0</sub>
|-
| colspan="16" |combination field not! starting with '11', bits ab = 00, 01 or 10
|-
| style="font-family:monospace; background:#cedff2;" | '''a''' || style="font-family:monospace; background:#cedff2;" | '''b''' || style="font-family:monospace; background:#cef2e0;" | '''c''' || style="font-family:monospace; background:#cef2e0;" | '''d''' || style="font-family:monospace; background:#cef2e0;" | '''e''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m'''
| || style="font-family:monospace; background:#cedff2;" | '''abmmmmmmmm'''|| style="background:#cef2e0;" | {{nowrap|{{mono|(0)'''cde tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt''' }}}}
Finite number with small first digit of significand (0&nbsp;…&nbsp;7).
|-
| colspan="16" |combination field starting with '11', but not 1111, bits ab = 11, bits cd = 00, 01 or 10
|-
| 1 || 1 || style="font-family:monospace; background:#cedff2;" | '''c''' || style="font-family:monospace; background:#cedff2;" | '''d''' || style="font-family:monospace; background:#cef2e0;" | '''e''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m''' || style="font-family:monospace; background:#cedff2;" | '''m'''
| || style="font-family:monospace; background:#cedff2;" | '''cdmmmmmmmm'''|| style="background:#cef2e0;" | {{nowrap|{{mono|'''100e tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt''' }}}}
Finite number with big first digit of significand (8 or 9).
|-
Line 161 ⟶ 163:
|}
 
The DPD/3BCD transcoding for the declets is given by the following table. b9...b0 are the bits of the DPD, and d2...d0 are the three BCD digits.
The resulting 'raw' exponent is a 10 bit binary integer where the leading bits are not '11', thus values 0 ... 1011111111<sub>b</sub> = 0 ... 767<sub>d</sub>, appr. bias is to be subtracted. The significand's leading decimal digit forms from the '''(0)cde''' or '''100e''' bits as binary integer. The subsequent digits are encoded in the 10 bit 'declet' fields 'tttttttttt' according the DPD rules (see below). The full decimal significand is then obtained by concatenating the leading and trailing decimal digits.
 
The 10-bit DPD to 3-digit BCD transcoding for the declets is given by the following table. b<sub>9</sub> … b<sub>0</sub> are the bits of the DPD, and d<sub>2</sub> … d<sub>0</sub> are the three BCD digits. Be aware that the bit numbering used here for e.g. b<sub>9</sub> … b<sub>0</sub> is in opposite direction than that used in the paper for the IEEE 754 standard b<sub>0</sub> … b<sub>9</sub>, add. the decimal digits are numbered 0-based here while in opposite direction and 1-based in the IEEE 754 paper. The bits on white background are not counting for the value, but signal how to understand / shift the other bits. The concept is to denote which digits are small (0 … 7) and encoded in three bits, and which are not, then calculated from a prefix of '100', and one bit specifying if 8 or 9.
 
{{Densely packed decimal}}
Line 169:
The 8 decimal values whose digits are all 8s or 9s have four codings each.
The bits marked x in the table above are [[don't care|ignored]] on input, but will always be 0 in computed results.
(The {{math|size=100%|1=8 × 3 = 24}} non-standard encodings fill in the unusedgap range frombetween {{math|size=100%|1=10<sup>3</sup> = 1000 toand 2<sup>10</sup> - 1 = 10231024.}})
 
In the above cases, with the ''true significand'' as the sequence of decimal digits decoded, the value represented is
 
:<math>(-1)^\text{signbit}\times 10^{\text{exponentbits}_2-398_{10}}\times \text{truesignificand}_{10}</math>
 
== History ==
decimal64 was formally introduced in the [[IEEE 754-2008 revision|2008 revision]]<ref name="IEEE-754_2008">{{cite book |author=IEEE Computer Society |url=https://ieeexplore.ieee.org/document/4610935 |title=IEEE Standard for Floating-Point Arithmetic |date=2008-08-29 |publisher=[[IEEE]] |isbn=978-0-7381-5753-5 |doi=10.1109/IEEESTD.2008.4610935 |id=IEEE Std 754-2008 |ref=CITEREFIEEE_7542008 |access-date=2016-02-08}}</ref> of the [[IEEE 754]] standard, which was taken over into the ISO/IEC/IEEE 60559:2011<ref name="ISO-60559_2011">{{Cite book |last=ISO/IEC JTC 1/SC 25 |url=https://www.iso.org/standard/57469.html |title=ISO/IEC/IEEE 60559:2011 — Information technology — Microprocessor Systems — Floating-Point arithmetic |date=June 2011 |publisher=ISO |pages=1–58}}</ref> standard.
 
== Side effects, more info ==
Zero has 768 possible representations (1536 accounting signed zeroes, in two different cohorts), (even many more if you account the 'illegal' significands which have to be treated as zeroes).
 
The gain in range and precision by the 'combination encoding' evolves because the taken 2 bits from the exponent only use three states, and the 4 MSBs of the significand stay within 0000&nbsp;…&nbsp;1001 (10 states). In total that is {{math|1=3&nbsp;×&nbsp;10&nbsp;=&nbsp;30}} possible values when combined in one encoding, which is representable in 5 bits ({{tmath|1=2^5=32}}).
 
== See also ==
* [[ISO/IEC 10967]], Language Independent Arithmetic
* [[Primitive data type]]
* [[Q notation (scientific notation)|D (E) notation (scientific notation)]]
 
== References ==