Numeric precision in Microsoft Excel: Difference between revisions

Content deleted Content added
Mdruiter (talk | contribs)
m typo
Mdruiter (talk | contribs)
m removed personal message; casing
Line 12:
Round-off is the loss of accuracy when numbers that differ by small amounts are subtracted. Because each number has only fifteen significant digits, their difference is inaccurate when there aren't enough significant digits to express the difference.
 
</ref> [[truncation]], and [[Binary numeral system|binary storage]], accumulation of the deviations of the operands in calculations, and worst: cancellation at subtractions resp. 'catastrophicCatastrophic cancellation' at subtraction of values with similar magnitude.
 
==Accuracy and binary storage==
[[File:Excel fifteen figure.PNG|thumb|390px|Excel maintains 15 figures in its numbers, but they are not always accurate: mathematical the bottom line should be the same as the top line, in 'fp-math' the step '1 + 1/9000' leads to a rounding up as the first bit of the 14 bit tail '10111000110010' of the mantissa falling off the table when adding 1 is a '1', this uprounding is not undone when subtracting the 1 again, since there is no information about the origin of the values in this step. thusThus the 're-subtracting' of 1 leaves a mantissa ending in '100000000000000' instead of '010111000110010', representing a value of '1.1111111111117289E-4' rounded by ex$el to 15 significant digits: '1.11111111111173E-4'.]]
[[File:Excel errors.PNG|thumb|390px|Of course mathematical 1 + x − 1 = x, 'floating point math' is sometimes a little different, that's not to be blamed on ex$elExcel. The discrepancy indicates the error. All errors are beyond the 15-th significant digit of the intermediate 1+x value, all errors are in high value digits of the final result, that's the problematic effect of 'cancellation'.]]
In the top figure the fraction 1/9000 in Excel is displayed. Although this number has a decimal representation that is an infinite string of ones, Excel displays only the leading 15 figures. In the second line, the number one is added to the fraction, and again Excel displays only 15 figures. In the third line, one is subtracted from the sum using Excel. Because the sum has only eleven 1s after the decimal, the true difference when ‘1’ is subtracted is three 0s followed by a string of eleven 1s. However, the difference reported by Excel is three 0s followed by a 15-digit string of ''thirteen'' 1s and two extra erroneous digits. Thus, the numbers Excel calculates with are ''not'' the numbers that it displays. Moreover, the error in Excel's answer is not simply round-off error, it's an effect in floating point calculations called 'cancellation'.
 
The inaccuracy in Excel calculations is more complicated than errors due to a precision of 15 significant figures. Excel's storage of numbers in binary format also affects its accuracy.<ref name=deLevie>
Line 23:
{{cite book |title=Advanced Excel for scientific data analysis |publisher=Oxford University Press |author=Robert de Levie |author-link=Robert de Levie |year=2004 |isbn=0-19-515275-1 |page=44 |chapter=Algorithmic accuracy |url=https://www.amazon.com/Advanced-Excel-Scientific-Data-Analysis/dp/0195152751/ref=sr_1_1?ie=UTF8&s=books&qid=1270770876&sr=1-1#reader_0195152751}}
 
</ref> To illustrate, the lower figure tabulates the simple addition {{nowrap|1 + ''x'' − 1}} for several values of ''x''. All the values of ''x'' begin at the 15-th decimal, so Excel must take them into account. Before calculating the sum 1 + ''x'', Excel first approximates ''x'' as a binary number. If this binary version of ''x'' is a simple power of 2, the 15-digit decimal approximation to ''x'' is stored in the sum, and the top two examples of the figure indicate recovery of ''x'' without error. In the third example, ''x'' is a more complicated binary number, ''x'' = 1.110111⋯111 × 2<sup>−49</sup> (15 bits altogether). Here the 'IEEE 754 double value' resulting of the 15 bit figure is 3.330560653658221E-15, which is rounded! by excel for the 'user interface' to 15 digits 3.33056065365822E-15, and then displayed with 30 decimals digits gets one 'fake zero' added, thus the 'binary' and 'decimal' values in the sample are identical only in display, the values associated with the cells are different (1.1101111111111100000000000000000000000000000000000000 × 2<sup>−49</sup> vs. 1.1101111111111011111111111111111111111111111111111101 × 2<sup>−49</sup>). similarSimilar is done by other spreadsheets, the handling of the different amount of decimal digits which can be exactly stored in the 53 bit mantissa of a 'double' (e.g. 16 digits between 1 and 8, but only 15 between 0,5 and 1 and between 8 and 10) is somewhat difficult and solved 'suboptimal'. In the fourth example, ''x'' is a ''decimal'' number not equivalent to a simple binary (although it agrees with the binary of the third example to the precision displayed). The decimal input is approximated by a binary and then ''that'' decimal is used. These two middle examples in the figure show that some error is introduced.
 
The last two examples illustrate what happens if ''x'' is a rather small number. In the second from last example, ''x'' = 1.110111⋯111 × 2<sup>−50</sup>; 15 bits altogether. theThe binary is replaced very crudely by a single power of 2 (in this example, 2<sup>−49</sup>) and its decimal equivalent is used. In the bottom example, a decimal identical with the binary above to the precision shown, is nonetheless approximated differently from the binary, and is eliminated by truncation to 15 significant figures, making no contribution to {{nowrap|1 + ''x'' − 1}}, leading to ''x'' = 0.<ref name=decimal_input>
 
To input a number as binary, the number is submitted as a string of powers of 2: 2^(−50)*(2^0 + 2^−1 + ⋯). To input a number as decimal, the decimal number is typed in directly.
Line 55:
</ref>
 
1. theThe shortcomings in the '= 1 + x - 1' tasks are a combination of 'fp-math weaknesses' and 'how excel handles it', especially excels rounding. excelExcel does some rounding and / or 'snap to zero' for most of it's results, in average chopping the last 3 bits of the IEEE double representation. This behavior can be switched of by setting the formula in parentheses: '= ( 1 + 2^-52 - 1 )'. youYou will see that even that small value survives. smallerSmaller values will pass away as there are only 53 bits to represent the value, for this case 1.0000000000 0000000000 0000000000 0000000000 0000000000 01, the first representing the '1', and the last the '2^-52'.
Beforementioned concepts need some clarification, will do it in better detail once i have time for it.
 
2. itIt's not only clean powers of two surviving, but any combination of values constructed of bits which will be within the 53 bits once the decimal 1 is added. asAs most decimal values don't have a clean finite representation in binary they will suffer from 'round ofoff' and 'cancellation' in tasks like the above.
1. the shortcomings in the '= 1 + x - 1' tasks are a combination of 'fp-math weaknesses' and 'how excel handles it', especially excels rounding. excel does some rounding and / or 'snap to zero' for most of it's results, in average chopping the last 3 bits of the IEEE double representation. This behavior can be switched of by setting the formula in parentheses: '= ( 1 + 2^-52 - 1 )'. you will see that even that small value survives. smaller values will pass away as there are only 53 bits to represent the value, for this case 1.0000000000 0000000000 0000000000 0000000000 0000000000 01, the first representing the '1', and the last the '2^-52'.
 
E.g. decimal 0,1 has the IEEE double representation 0 (1).1001100110011001100110011001100110011001100110011010 times 2^-4, and added to 140737488355328.0 (which is 2^47) will loose all of it's bits except the first two. thusThus from '= ( 140737488355328.0 + 0,1 - 140737488355328.0) it will come back as 0.09375 calculated with www.weitz.de/ieee (64 bit) as well as in excel with the parentheses around the formula. This effect mostly can be managed by meaningful rounding, which excel doesn't apply, it's up to the user.
2. it's not only clean powers of two surviving, but any combination of values constructed of bits which will be within the 53 bits once the decimal 1 is added. as most decimal values don't have a clean finite representation in binary they will suffer from 'round of' and 'cancellation' in tasks like the above.
 
Needless to say: other spreadsheets have similar problems, LibreOffice Calc uses a more aggressive rounding, while gnumeric tries to keep precision and make as well the precision as the 'lack of' visible for the user.
E.g. decimal 0,1 has the IEEE double representation 0 (1).1001100110011001100110011001100110011001100110011010 times 2^-4, and added to 140737488355328.0 (which is 2^47) will loose all of it's bits except the first two. thus from '= ( 140737488355328.0 + 0,1 - 140737488355328.0) it will come back as 0.09375 calculated with www.weitz.de/ieee (64 bit) as well as in excel with the parentheses around the formula. This effect mostly can be managed by meaningful rounding, which excel doesn't apply, it's up to the user.
 
Needless to say: other spreadsheets have similar problems, LibreOffice Calc uses a more aggressive rounding, while gnumeric tries to keep precision and make as well the precision as the 'lack of' visible for the user.
 
 
Line 81 ⟶ 79:
is mathematically equivalent to:
 
:<math>\sqrt{ \frac{ n\Sigma x^2 - \left( \Sigma x \right) ^2 }{n^2} } \ . </math>
 
However, the first form keeps better numerical accuracy for large values of ''x'', because squares of differences between ''x'' and ''x''<sub>av</sub> leads to less round-off than the differences between the much larger numbers Σx<sup>2</sup> and (Σx)<sup>2</sup>. The built-in Excel function STDEVP(), however, uses the less accurate formulation because it is faster computationally.<ref name=Levie>
Line 100 ⟶ 98:
:<math>B1: = A1 - A2</math>
:<math>B2: = A2 - A3</math>
Both cells <math>B1</math> and <math>B2</math> display <math>1.1530</math>.
However, if cell <math>C1</math> contains the formula <math>B1 - B2</math>
then <math>C1</math> does not display <math>0</math> as would be expected,
but displays <math>-3.55271E-15</math> instead.
 
(above is not limited to subtractions, try '=1+1,405*2^-48' in one cell, excel rounds the display to 1,00000000000000000000, and '=0,9+225179982494413*2^-51' in another, same display (in the range above 1 / below 1 the rounding is different, which hits most decimal or binary magnitude changes). above, different rounding for value and display, violates one of the elementary requirements in [https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Goldberg: 'What Every Computer Scientist Should Know About Floating-Point Arithmetic'] (more or less 'the holy book' of fp-math), there stated: 'it is important to make sure that its use is transparent to the user. For example, on a calculator, if the internal representation of a displayed value is not rounded to the same precision as the display, then the result of further operations will depend on the hidden digits and appear unpredictable to the user' (the problem is not limited to excel, e.g. LibreOffice calc acts similar).
 
===Round-off error===
Line 112 ⟶ 110:
The solutions (the roots) of this equation are exactly determined by the [[quadratic formula]]:
 
:<math>x= \frac{-b \pm \sqrt{b^2-4ac} }{2a}. </math>
 
When one of these roots is very large compared to the other, that is, when the square root is close to the value ''b'', the evaluation of the root corresponding to subtraction of the two terms becomes very inaccurate due to round-off (cancellation?).
Line 118 ⟶ 116:
It is possible to determine the round-off error by using the [[Taylor series]] formula for the square root:
<ref name="Zwillinger_2014">{{cite book |author-first1=Izrail Solomonovich |author-last1=Gradshteyn |author-link1=Izrail Solomonovich Gradshteyn |author-first2=Iosif Moiseevich |author-last2=Ryzhik |author-link2=Iosif Moiseevich Ryzhik |author-first3=Yuri Veniaminovich |author-last3=Geronimus |author-link3=Yuri Veniaminovich Geronimus |author-first4=Michail Yulyevich |author-last4=Tseytlin |author-link4=Michail Yulyevich Tseytlin |author-first5=Alan |author-last5=Jeffrey |editor-first1=Daniel |editor-last1=Zwillinger |editor-first2=Victor Hugo |editor-last2=Moll |translator=Scripta Technica, Inc. |title=Table of Integrals, Series, and Products |publisher=[[Academic Press, Inc.]] |date=2015 |orig-year=October 2014 |edition=8 |language=English |isbn=0-12-384933-0 <!--|ISBN=978-0-12-384933-5 --> |lccn=2014010276 <!-- |url=https://books.google.com/books?id=NjnLAwAAQBAJ |access-date=2016-02-21-->|title-link=Gradshteyn and Ryzhik |chapter=1.112. Power series |page=25}}</ref>
:<math>\sqrt{b^2-4ac} = b \ \sqrt{1-\frac{4ac}{b^2}} \approx b \left( 1 -\frac{2ac}{b^2} + \frac{2 a^2 c^2 }{b^4} + \cdots \right ). </math>
 
Consequently,
Line 128 ⟶ 126:
becomes smaller and smaller. The numbers for ''b'' and the square root become nearly the same, and the difference becomes small:
 
:<math>b - \sqrt{b^2-4ac} \approx b - b + \varepsilon. </math>
 
Under these circumstances, all the significant figures go into expressing ''b''. For example, if the precision is 15 figures, and these two numbers, ''b'' and the square root, are the same to 15 figures, the difference will be zero instead of the difference ε.
Line 138 ⟶ 136:
</ref> If we denote the two roots by ''r''&thinsp;<sub>1</sub> and ''r''&thinsp;<sub>2</sub>, the quadratic equation can be written:
 
:<math>\left(x - r_1\right) \left( x - r_2 \right) = x^2 - \left( r_1 + r_2 \right) x + r_1 \ r_2 = 0. </math>
 
When the root ''r''&thinsp;<sub>1</sub> >> ''r''&thinsp;<sub>2</sub>, the sum (''r''&thinsp;<sub>1</sub> + ''r''&thinsp;<sub>2</sub>&thinsp;) ≈ ''r''&thinsp;<sub>1</sub> and comparison of the two forms shows approximately:
:<math> r_1 \approx -\frac{b}{a}, </math>
while
:<math> r_1 \ r_2 = \frac{c}{a}. </math>
Thus, we find the approximate form:
:<math>r_2 = \frac {c}{a \ r_1} \approx -\frac {c}{b}. </math>
These results are not subject to round-off error, but they are not accurate unless ''b''<sup>2</sup> is large compared to&nbsp; ''ac''.
 
[[File:Excel quadratic error.PNG|thumb|350px| Excel graph of the difference between two evaluations of the smallest root of a quadratic: direct evaluation using the quadratic formula (accurate at smaller ''b'') and an approximation for widely spaced roots (accurate for larger ''b''). The difference reaches a minimum at the large dots, and round-off causes squiggles in the curves beyond this minimum.]]
 
The bottom line is that in doing this calculation using Excel, as the roots become farther apart in value, the method of calculation will have to switch from direct evaluation of the quadratic formula to some other method so as to limit round-off error. The point to switch methods varies according to the size of coefficients ''a'' and&nbsp; ''b''.
 
In the figure, Excel is used to find the smallest root of the quadratic equation ''x''<sup>2</sup>&nbsp;+&nbsp;''bx''&nbsp;+&nbsp;''c''&nbsp;=&nbsp;0 for ''c''&nbsp;=&nbsp;4 and&nbsp;''c''&nbsp;=&nbsp;4&nbsp;×&nbsp;10<sup>5</sup>. The difference between direct evaluation using the quadratic formula and the approximation described above for widely spaced roots is plotted ''vs.'' ''b''. Initially the difference between the methods declines because the widely spaced root method becomes more accurate at larger ''b''-values. However, beyond some ''b''-value the difference increases because the quadratic formula (good for smaller ''b''-values) becomes worse due to round-off, while the widely spaced root method (good for large ''b''-values) continues to improve. The point to switch methods is indicated by large dots, and is larger for larger ''c''-values. At large ''b''-values, the upward sloping curve is Excel's round-off error in the quadratic formula, whose erratic behavior causes the curves to squiggle.