Content deleted Content added
Restored revision 1262568391 by Vincent Lefèvre (talk): This is all unsourced, see WP:NOR |
Undid revision 1266772177 by MrOllie (talk)vandalizing revert, the performance is proven by the code given reg. lack of other sources, pls. stop this silly reverting, you also trash edits acc. your own requests, instead bring complaints into discussion on the talk page. |
||
Line 5:
In [[computing]], '''decimal64''' is a [[decimal floating point|decimal floating-point]] [[computer number format]] that occupies 8 bytes (64 bits) in computer memory.
== Purpose and use ==
decimal64 fits well to replace binary64 format in applications where 'small deviations' are unwanted and speed isn't extremely crucial.
In contrast to the '''binary'''xxx data formats the '''decimal'''xxx formats provide exact representation of decimal fractions, exact calculations with them and enable human common 'ties away from zero' rounding (in some range, to some precision, to some degree). In a trade-off for reduced performance. They are intended for applications where it's requested to come near to schoolhouse math, such as financial and tax computations. (In short they avoid plenty of problems like 0.2 + 0.1 -> 0.30000000000000004 which happen with binary64 datatypes.)
== Range and precision ==
Decimal64 supports 'normal' values that can have 16 digit precision from {{gaps|±1.000|000|000|000|000|e=-383}} to {{gaps|±9.999|999|999|999|999|e=384}}, plus 'denormal' values with ramp-down relative precision down to ±1 × 10<sup>−398</sup> (only one digit left), [[signed zero]]s, signed infinities and [[NaN]] (Not a Number).
The binary format of the same bit-size supports a range from denormal-min {{gaps|±5|||||e=-324|}}, over normal-min with full 53-bit precision {{gaps|±2.225|073|858|507|201|e=-308|4}} to max {{gaps|±1.797|693|134|862|315|e=+308|7}}.
== Performance ==
Performance comparison is not easy, not very accurate and lacks reproducibility on modern IT systems for various reasons. One can roughly say that in a current 64-bit Intel(r) / linux / gcc / libdfp / BID implementation, basic arithmetic operations with decimal64 values are between factor 2 and 15 slower than with binary64 data types, while 'higher' functions like powers ( ~600 ) and trigonometric functions like tangent ( {{gaps|~10|000}} ) suffer more performance penalties. To get an idea about performance on a specific system the code in 'Addendum - code' can be used. Perhaps the [https://gcc.gnu.org/contribute.html GNU gcc project] and the [https://github.com/libdfp/libdfp/issues?q=is%3Aissue 'libdfp' project on github] could like some help to improve.
== Representation / encoding of decimal64 values ==
decimal64 values are represented in a 'not normalized' near to 'scientific format', with combining some bits of the exponent with the leading bits of the significand in a 'combination field'.
{| class="wikitable"
|+ Generic encoding
|-
! <u>S</u>ign !! Co<u>m</u>bination !! <u>T</u>railing significand bits
|-
! 1 bit !! 13 bits !! 50 bits
|-
| {{mono|s}} || {{mono|mmmmmmmmmmmmm}} ||
|}
Besides the special cases infinities and NaNs there are four points relevant to understand the encoding of decimal64.
* BID vs. DPD encoding, '''B'''inary '''I'''nteger '''D'''ecimal using a positive [[binary integer decimal]] for the significand, software centric and designed by Intel(r), vs. '''D'''ensely '''P'''acked '''D'''ecimal based on [[densely packed decimal]] encoding for all except the first digit of the significand, hardware centric and promoted by IBM(r), differences see below. Both alternatives provide exactly the same range of representable numbers: 16 digits of significand and {{math|size=100%|1=3 × 2<sup>8</sup> = 768}} possible exponent values. IEEE 754 allows these two different encodings, without a concept to denote which is used, for instance in a situation where decimal64 values are communicated between systems. CAUTION!: Be aware that transferring binary data between systems using different encodings will mostly produce valid decimal64 numbers, '''but with different value'''. Prefer data exchange in íntegral or ASCII 'triplets' for sign, exponent and significand.
* The significands are not 'normalized' (the leading digit(s) are allowed to be "0"), and thus most values with less than 7 [[significant digits]] have multiple possible representations; 1000000 × 10<sup>-2</sup>=100000 × 10<sup>-1</sup>=10000 × 10<sup>0</sup>=1000 × 10<sup>1</sup> all have the value 10000. These sets of representations for a same value are called [[Cohort (floating point)|cohorts]]'','' the different members can be used to denote how many digits of the value are known precisely.
* The encodings combine two bits of the exponent with the leading 3 to 4 bits of the significand in a 'combination field', different for 'big' vs. 'small' significands. That enables bigger precision and range, in trade-off that some simple functions like sort and compare, very frequently used in coding, do not work on the bit pattern but require computations to extract exponent and significand and then try to obtain an exponent aligned representation. This effort is partly balanced by saving the effort for normalization, but contributes to the slower performance of the decimal datatypes. Beware: BID and DPD use different bits of the combination field for that, see below.
* Different understanding of significand as integer or fraction, and acc. different bias to apply for the exponent (for decimal64 what is stored in bits can be decoded as base to the power of 'stored value for the exponent minus '''bias of 383'''<nowiki/>' times significand understood as d<sub>0</sub> '''.''' d<sub>−1</sub> d<sub>−2</sub> d<sub>−3</sub> d<sub>−4</sub> d<sub>−5</sub> d<sub>−6</sub> d<sub>−7</sub> d<sub>−8</sub> d<sub>−9</sub> d<sub>−10</sub> d<sub>−11</sub> d<sub>−12</sub> d<sub>−13</sub> d<sub>−14</sub> d<sub>−15</sub> (note: radix dot after first digit, significand '''fractional'''), or base to the power of 'stored value for the exponent minus '''bias of 398'''<nowiki/>' times significand understood as d15 d14 d13 d12 d11 d10 d9 d8 d7 d<sub>6</sub> d<sub>5</sub> d<sub>4</sub> d<sub>3</sub> d<sub>2</sub> d<sub>1</sub> d<sub>0</sub> (note: no radix dot, significand '''integral'''), both produce the same result [2019 version of IEEE 754 in clause 3.3, page 18]. For decimal datatypes the second view is more common, while for binary datatypes the first, the biases are different for each datatype.)
In all cases for decimal364, the value represented is
: (−1)<sup>''sign''</sup> × 10<sup>''exponent''−'''398'''</sup> × ''significand'', with the ''significand'' understood as positive integer.
Alternatively it can be understood as (−1)<sup>''sign''</sup> × 10<sup>''exponent''−'''383'''</sup> × ''significand'' with the ''significand'' digits understood as d<sub>0</sub> '''.''' d<sub>−1</sub> d<sub>−2</sub> d<sub>−3</sub> d<sub>−4</sub> d<sub>−5</sub> d<sub>−6 ...</sub>, note the radix dot making it a fraction.
decimal64 is superpower to binary64 in range, and head to head in precision, dec64 all normal values 16-digit, while bin64 alternates between 'not fully 16' and about 16.5 decimal digits ( 17 digits with gaps ).
=== BID encoding ===
This format uses a binary significand from 0 to {{math|size=100%|1=10<sup>16</sup> − 1 = {{gaps|9|999|999|999|999|999}} = 2386F26FC0FFFF<sub>16</sub> = {{gaps|1000|1110000110|1111001001|1011111100|0000111111|1111111111<sub>2</sub>}}.}}The encoding, completely stored on 64 bits, can represent binary significands up to {{math|size=100%|1=10 × 2<sup>50</sup> − 1 = {{gaps|11|258|999|068|426|239}} = 27FFFFFFFFFFFF<sub>16</sub>,}} but values larger than {{math|size=100%|1=10<sup>16</sup> − 1}} are illegal (and the standard requires implementations to treat them as 0, if encountered on input).
Line 44 ⟶ 56:
If the {{val|2|u=bits}} after the sign bit are "11", then the 10-bit exponent field is shifted {{val|2|u=bits}} to the right (after both the sign bit and the "11" bits thereafter), and the represented significand is in the remaining {{val|51|u=bits}}. In this case there is an implicit (that is, not stored) leading 3-bit sequence "100" for the MSB bits of the true significand (in the remaining lower bits ''ttt...ttt'' of the significand, not all possible values are used).
Be aware that the bit numbering used in the tables for e.g. m<sub>12</sub> … m<sub>0</sub> is in opposite direction than that used in the paper for the IEEE 754 standard G<sub>0</sub> … G<sub>12</sub>.
{| class="wikitable" style="text-align:left; border-width:0;"
Line 53 ⟶ 67:
! rowspan="2" |Significand / Description
|-
! m<sub>12</sub>!! m<sub>11</sub>!! m<sub>10</sub>!! m<sub>9</sub>!! m<sub>8</sub>!! m<sub>7</sub>!! m<sub>6</sub>!! m<sub>5</sub>!! m<sub>4</sub>!! m<sub>3</sub>!! m<sub>2</sub>
!m<sub>1</sub>
!m<sub>0</sub>
|-
| colspan="16" |combination field not! starting with '11', bits ab = 00, 01 or 10
Line 61 ⟶ 75:
| style="background:#cedff2;" | '''a''' || style="background:#cedff2;" | '''b''' || style="background:#cedff2;" | '''c''' || style="background:#cedff2;" | '''d''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cef2e0;" | '''e''' || style="background:#cef2e0;" |'''f''' || style="background:#cef2e0;" |'''g'''
| || style="background:#cedff2;" | '''abcdmmmmmm''' || style="background:#cef2e0;" | (0)'''efgtttttttttttttttttttttttttttttttttttttttttttttttttt'''
Finite number with 'small'
|-
| colspan="16" |combination field starting with '11', but not 1111, bits ab = 11, bits cd = 00, 01 or 10
Line 67 ⟶ 81:
| 1 || 1 || style="background:#cedff2;" | '''c'''|| style="background:#cedff2;" | '''d''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''m''' || style="background:#cedff2;" | '''e''' || style="background:#cedff2;" | '''f''' || style="background:#cef2e0;" | '''g'''
| || style="background:#cedff2;" | '''cdmmmmmmef''' || style="background:#cef2e0;" | '''100gtttttttttttttttttttttttttttttttttttttttttttttttttt'''
Finite number with 'big'
|-
| colspan="16" |combination field starting with '1111', bits abcd = 1111
Line 89 ⟶ 103:
|signaling NaN (with payload in significand)
|}
The resulting 'raw' exponent is a 10 bit binary integer where the leading bits are not '11', thus values 0 ... 1011111111<sub>b</sub> = 0 ... 767<sub>d</sub>, appr. bias is to be subtracted. The resulting significand could be a positive binary integer of 54 bits up to 1001 1111111111 1111111111 1111111111 1111111111 1111111111<sub>b</sub> = 11258999068426239<sub>d</sub>, but values above 10<sup>16</sup> − 1 = 9999999999999999 = 2386F26FC0FFFF<sub>16</sub> = 100011100001101111001001101111110000001111111111111111<sub>2</sub> are 'illegal' and have to be treated as zeroes. To obtain the individual decimal digits the significand has to be divided by 10 repeatedly.
In the above cases, the value represented is
Line 96 ⟶ 111:
: {{math|1=(−1)<sup>sign</sup> × 10<sup>exponent−398</sup> × significand}} <!-- Remember, significand is defined as an integer: 0 <= significand < 10^16 -->
=== DPD encoding ===
In this version, the significand is stored as a series of decimal digits. The leading digit is between 0 and 9 (3 or 4 binary bits), and the rest of the significand uses the [[densely packed decimal]] (DPD) encoding.
Line 114 ⟶ 122:
If the first two bits after the sign bit are "00", "01", or "10", then those are the leading bits of the exponent, and the three bits "cde" after that are interpreted as the leading decimal digit (0 to 7):
If the first two bits after the sign bit are "11", then the second 2-bits are the leading bits of the exponent, and the next bit "e" is prefixed with implicit bits "100" to form the leading decimal digit of the significand (8 or 9):
The remaining two combinations (11 110 and 11 111) of the 5-bit field after the sign bit are used to represent ±infinity and NaNs, respectively.
Line 126 ⟶ 134:
! rowspan="2" |Significand / Description
|-
! m<sub>12</sub>!! m<sub>11</sub>!! m<sub>10</sub>!! m<sub>9</sub>!! m<sub>8</sub>!! m<sub>7</sub>!! m<sub>6</sub>!! m<sub>5</sub>!! m<sub>4</sub>!! m<sub>3</sub>!! m<sub>2</sub>
!m<sub>1</sub>
!m<sub>0</sub>
|-
| colspan="16" |combination field not! starting with '11', bits ab = 00, 01 or 10
Line 163 ⟶ 171:
|}
The resulting 'raw' exponent is a 10 bit binary integer where the leading bits are not '11', thus values 0 ... 1011111111<sub>b</sub> = 0 ... 767<sub>d</sub>, appr. bias is to be subtracted. The significand's leading decimal digit forms from the '''(0)cde''' or '''100e''' bits as binary integer. The subsequent digits are encoded in the 10 bit 'declet' fields 'tttttttttt' according the DPD rules (see below). The full decimal significand is then obtained by concatenating the leading and trailing decimal digits.
The 10-bit DPD to 3-digit BCD transcoding for the declets is given by the following table. b<sub>9</sub> … b<sub>0</sub> are the bits of the DPD, and d<sub>2</sub> … d<sub>0</sub> are the three BCD digits. Be aware that the bit numbering used here for e.g. b<sub>9</sub> … b<sub>0</sub> is in opposite direction than that used in the paper for the IEEE 754 standard b<sub>0</sub> … b<sub>9</sub>, add. the decimal digits are numbered 0-based here while in opposite direction and 1-based in the IEEE 754 paper. The bits on white background are not counting for the value, but signal how to understand / shift the other bits. The concept is to denote which digits are small (0 … 7) and encoded in three bits, and which are not, then calculated from a prefix of '100', and one bit specifying if 8 or 9.
{{Densely packed decimal}}
Line 170 ⟶ 179:
The 8 decimal values whose digits are all 8s or 9s have four codings each.
The bits marked x in the table above are [[don't care|ignored]] on input, but will always be 0 in computed results.
(The
In the above cases, with the ''true significand'' as the sequence of decimal digits decoded, the value represented is
:<math>(-1)^\text{signbit}\times 10^{\text{exponentbits}_2-398_{10}}\times \text{truesignificand}_{10}</math>
== History ==
decimal64 was formally introduced in the [[IEEE 754-2008 revision|2008 revision]]<ref name="IEEE-754_2008">{{cite book |author=IEEE Computer Society |url=https://ieeexplore.ieee.org/document/4610935 |title=IEEE Standard for Floating-Point Arithmetic |date=2008-08-29 |publisher=[[IEEE]] |isbn=978-0-7381-5753-5 |doi=10.1109/IEEESTD.2008.4610935 |id=IEEE Std 754-2008 |ref=CITEREFIEEE_7542008 |access-date=2016-02-08}}</ref> of the [[IEEE 754]] standard, which was taken over into the ISO/IEC/IEEE 60559:2011<ref name="ISO-60559_2011">{{Cite book |last=ISO/IEC JTC 1/SC 25 |url=https://www.iso.org/standard/57469.html |title=ISO/IEC/IEEE 60559:2011 — Information technology — Microprocessor Systems — Floating-Point arithmetic |date=June 2011 |publisher=ISO |pages=1–58}}</ref> standard.
== Less important information, side effects of the encoding ==
DPD encoding is pretty efficient, not wasting more than about 2.4 percent of space vs. BID, because the 2<sup>10</sup> = 1024 possible values in 10 bit is only little more than what is used to encode all numbers from 0 to 999.
Zero has 768 possible representations (1536 accounting signed zeroes, in two different cohorts), (even many more if you account the 'illegal' significands which have to be treated as zeroes).
The gain in range and precision by the 'combination encoding' evolves because the taken 2 bits from the exponent only use three of four possible states, and the 4 MSBs of the significand stay within 0000 … 1001 (10 of 16 possible states). In total that is {{math|1=3 × 10 = 30}} possible values when combined in one encoding, which is representable in 5 instead of 6 bits ({{tmath|1=2^5=32}}).
The decimalxxx formats include denormal values, for a graceful degradation of precision near zero, but in contrast to the binaryxxx formats they are not marked / do not need a special exponent, in decimal64 they are just values too small to have full 16 digit precision even with the smallest exponent.
In the cases of Infinity and NaN, all other bits of the encoding are ignored. Thus, it is possible to initialize an array to Infinities or NaNs by filling it with a single byte value.
== Addendum - code ==
try this to see performance of addition and tangent for bin32, bin64, dec32 and dec64 datatypes, compile and run instructions see in header.
The output consists of: clock cycles taken; iterations; result; expression tested .
<pre>
// program to compare the performance of binary vs. decimal datatypes,
// WIP, covering elementary operations,
// requires 'libdfp' installed,
// compile with: 'gcc -I /usr/local/include/dfp -o decxx_perf_sample.c -ldfp -lm -lquadmath'
// or - optimized - with: 'gcc -O2 -I /usr/local/include/dfp -o decxx_perf_sample.c -ldfp -lm -lquadmath'
// run with: './dec32_perf_sample value_1 value_2 (count)
// e.g. './dec32_perf_sample 8.0 5.0E-7 1000'
#define __STDC_WANT_IEC_60559_TYPES_EXT__
#define __STDC_WANT_DEC_FP__
#define __STDC_WANT_IEC_60559_DFP_EXT__
#include <fenv.h>
#include <stdio.h> // reg. e.g. printf,
#include <float.h>
#include <limits.h>
#include <math.h> // reg. e.g. pow,
#include <stdlib.h> // reg. e.g. atof,
#include <time.h> // reg. e.g. clock(),
#include <locale.h> // reg. formatted print of integers, not yet sufficient,
// #include <decimal.h> // reg. ???
#include <string.h> // reg. e.g. strcat,
#include <quadmath.h> // reg. e.g. quadmath_snprintf,
clock_t start1, end1;
#define TIMEITcf( expr, N ) \
start1 = clock(); \
for( int i = 1; i < N; ++i ) \
{ \
expr; \
} \
end1 = clock(); \
printf( "%07d; %d; %.9E; %s \n", end1 - start1, N, expr, #expr )
#define TIMEITcd( expr, N ) \
start1 = clock(); \
for( int i = 1; i < N; ++i ) \
{ \
expr; \
} \
end1 = clock(); \
printf( "%07d; %d; %.18E; %s \n", end1 - start1, N, expr, #expr )
#define TIMEITcDF( expr, N ) \
start1 = clock(); \
for(int i = 1; i < N; ++i) \
{ \
expr; \
} \
end1 = clock(); \
printf( "%07d; %d; %.8HE; %s \n", end1 - start1, N, expr, #expr )
#define TIMEITcDD( expr, N ) \
start1 = clock(); \
for(int i = 1; i < N; ++i) \
{ \
expr; \
} \
end1 = clock(); \
printf( "%07d; %d; %.17DE; %s \n", end1 - start1, N, expr, #expr )
int main( int argc, char *argv[] )
{
fe_dec_setround( 4 ); // round ties away from zero for decimal datatypes,
// setlocale(LC_ALL, "en_US"); // or any other locale that supports thousands separators
volatile float x1f = 0.0, x2f = 0.0, x3f = 0.0;
volatile double x1d = 0.0, x2d = 0.0, x3d = 0.0;
volatile _Decimal32 x1DF = 0.0DF, x2DF = 0.0DF, x3DF = 0.0DF;
volatile _Decimal64 x1DD = 0.0DD, x2DD = 0.0DD, x3DD = 0.0DD;
volatile int count = 1000000; // how many times to run the loop,
if( argv[ 3 ] ) count = atoi( argv[ 3 ] );
printf("add two values from command line arguments \n" );
printf("no benefit from '-O2'? \n" );
x1f = strtof( argv[ 1 ], NULL );
x2f = strtof( argv[ 2 ], NULL );
TIMEITcf( x1f = x1f + x2f, count );
x1d = strtod( argv[ 1 ], NULL );
x2d = strtod( argv[ 2 ], NULL );
TIMEITcd( x1d = x1d + x2d, count );
x1DF = strtod32( argv[ 1 ], NULL );
x2DF = strtod32( argv[ 2 ], NULL );
TIMEITcDF( x1DF = x1DF + x2DF, count );
x1DD = strtod64( argv[ 1 ], NULL );
x2DD = strtod64( argv[ 2 ], NULL );
TIMEITcDD( x1DD = x1DD + x2DD, count );
printf(" \n" );
printf("tangent of value from command line argument \n" );
printf("no benefit from '-O2'? \n" );
x1f = strtof( argv[ 1 ], NULL );
TIMEITcf( x3f = tan( x1f ), count );
x1d = strtod( argv[ 1 ], NULL );
TIMEITcd( x3d = tan( x1d ), count );
x1DF = strtod32( argv[ 1 ], NULL );
TIMEITcDF( x3DF = tand32( x1DF ), count );
x1DD = strtod64( argv[ 1 ], NULL );
TIMEITcDD( x3DD = tand64( x1DD ), count );
printf(" \n" );
return 0;
}
</pre>
== See also ==
* [[ISO/IEC 10967]], Language Independent Arithmetic
* [[Primitive data type]]
* [[Q notation (scientific notation)|D (E) notation (scientific notation)]]
== References ==
|