Content deleted Content added
inserted "notelist" template |
Adding short description: "Details of data storage in a spreadsheet application" |
||
(7 intermediate revisions by 7 users not shown) | |||
Line 1:
{{Short description|Details of data storage in a spreadsheet application}}
As with other spreadsheets, [[Microsoft Excel]] works only to limited accuracy because it retains only a certain number of figures to describe numbers (it has limited [[Arithmetic precision|precision]]). With some exceptions regarding erroneous values, infinities, and denormalized numbers, Excel calculates in [[double-precision floating-point format]] from the [[IEEE 754-2008|IEEE 754 specification]]<ref name=microsoft_spec>
{{cite web
</ref> (besides numbers, Excel uses a few other data types<ref name=Dalton>
{{cite book |author=Dalton, Steve |year=2007 |chapter=Table 2.3: Worksheet data types and limits |title=Financial Applications Using Excel Add-in Development in C/C++|pages=13–14 |isbn=978-0-470-02797-4 |edition=2nd |publisher=Wiley |chapter-url=https://books.google.com/books?id=ABUSU9PWUuIC&pg=PA13}}
</ref>). Although Excel allows ''display'' of up to 30 decimal places, its ''precision'' for any specific number is no more than 15 [[significant figures]], and calculations may have an accuracy that is even less due to five issues: [[round-off error|round off]],{{efn|
name=roundoff|
Round-off is the loss of accuracy when numbers that differ by small amounts are subtracted. Because each number has only fifteen significant digits, their difference is inaccurate when there aren't enough significant digits to express the difference.
}}
==Accuracy and binary storage==
[[File:Excel fifteen figure.PNG|thumb|390px|Excel maintains 15
[[File:Excel errors.PNG|thumb|390px|Of course mathematical {{nobr|{{math| 1 + ''x'' − 1 {{=}} ''x''}} ,}} 'floating point math' is sometimes a little different, that is not to be blamed on Excel. The discrepancy indicates the error. All errors are beyond the 15
In the top figure the fraction 1/9000 in Excel is displayed. Although this number has a decimal representation that is an infinite string of ones, Excel displays only the leading 15 figures. In the second line, the number one is added to the fraction, and again Excel displays only 15 figures. In the third line, one is subtracted from the sum using Excel. Because the sum has only eleven 1s after the decimal, the true difference when ‘1’ is subtracted is three 0s followed by a string of eleven 1s. However, the difference reported by Excel is three 0s followed by a 15
The inaccuracy in Excel calculations is more complicated than errors due to a precision of 15 significant figures. Excel's storage of numbers in binary format also affects its accuracy.<ref name=deLevie-ch-p44>
{{cite book
|author=de Levie, Robert |author-link=Robert de Levie
|year=2004
|chapter=Algorithmic accuracy
|title=Advanced Excel for scientific data analysis
|publisher=Oxford University Press
|isbn=0-19-515275-1 |page=44
}}
</ref>
To illustrate, the lower figure tabulates the simple addition {{nobr|{{math| 1 + ''x'' − 1}} }} for several values of {{mvar|x}}. All the values of {{mvar|x}} begin at the 15 th decimal, so Excel must take them into account. Before calculating the sum {{nobr|{{math| 1 + ''x''}} ,}} Excel first approximates {{mvar|x}} as a binary number. If this binary version of {{mvar|x}} is a simple power of 2, the 15 digit decimal approximation to {{mvar|x}} is stored in the sum, and the top two examples of the figure indicate recovery of {{mvar|x}} without error. In the third example, {{mvar|x}} is a more complicated binary number, {{nobr|{{math| ''x'' {{=}} 1.110111⋯111 × 2{{sup|−49}} }} }} (15 bits altogether). Here the 'IEEE 754 double value' resulting of the 15 bit figure is 3.330560653658221E-15, which is rounded by Excel for the 'user interface' to 15 digits 3.33056065365822E-15, and then displayed with 30 decimals digits gets one 'fake zero' added, thus the 'binary' and 'decimal' values in the sample are identical only in display, the values associated with the cells are different (1.1101111111111100000000000000000000000000000000000000 × 2<sup>−49</sup> vs. 1.1101111111111011111111111111111111111111111111111101 × 2<sup>−49</sup>). Similar is done by other spreadsheets, the handling of the different amount of decimal digits which can be exactly stored in the 53 bit mantissa of a 'double' (e.g. 16 digits between 1 and 8, but only 15 between {{sfrac|1|2}} and 1 and between 8 and 10) is somewhat difficult and solved 'suboptimal'. In the fourth example, {{mvar|x}} is a ''decimal'' number not equivalent to a simple binary (although it agrees with the binary of the third example to the precision displayed). The decimal input is approximated by a binary and then ''that'' decimal is used. These two middle examples in the figure show that some error is introduced.
The last two examples illustrate what happens if {{mvar|x}} is a rather small number. In the second from last example, {{nobr|{{math|''x'' {{=}} 1.110111⋯111 × 2{{sup|−50}} }} ;}} 15 bits altogether. The binary is replaced very crudely by a single power of 2 (in this example, 2{{sup|−49}}) and its decimal equivalent is used. In the bottom example, a decimal identical with the binary above to the precision shown, is nonetheless approximated differently from the binary, and is eliminated by truncation to 15 significant figures, making no contribution to {{nobr|{{math|1 + ''x'' − 1}} ,}} leading to {{nobr|{{math|''x'' {{=}} 0}} .}}{{efn|
name=decimal_input|
To input a number as binary, the number is submitted as a string of powers of 2: 2^(−50)*(2^0 + 2^−1 + ⋯). To input a number as decimal, the decimal number is typed in directly.
}}
For {{mvar|x}}′s that are not simple powers of 2, a noticeable error in {{nobr|{{math| 1 + ''x'' − 1}} }} can occur even when {{mvar|x}} is quite large. For example, if {{nobr|{{math|''x'' {{=}} {{sfrac|1|1000}} }} ,}} then {{math|{{nobr|1 + ''x'' − 1}} {{=}} {{nobr|9.99999999999'''''89''''' × 10{{sup|−4}} ,}} }} an error in the 13 th significant figure. In this case, if Excel simply added and subtracted the decimal numbers, avoiding the conversion to binary and back again to decimal, no round-off error would occur and accuracy actually would be better. Excel has the option to "Set precision as displayed".{{efn|
name=discuss|
This option is found on the "Excel options"|"Advanced" tab. See KB 214118.<ref>{{cite web
|series=How to correct rounding errors
|title=Method 2
|website=Microsoft support
|id=KB 214118
|url=http://support.microsoft.com/kb/214118}}</ref>
}}
With this option, depending upon circumstance, accuracy may turn out to be better or worse, but you will know exactly what Excel is doing. (Only the selected precision is retained, and one cannot recover extra digits by reversing this option.) Some similar examples can be found at this link.<ref name =arithmetic>
{{cite news
|title=Excel addition strangeness
|website=office-watch.com
|url=https://office-watch.com/2008/excel-sum-anomaly
}}
</ref>
In short, a variety of accuracy behavior is introduced by the combination of representing a number with a limited number of binary digits, along with [[Truncation error|truncating]] numbers beyond the fifteenth significant figure.<ref name=deLevie-2004>
{{cite book
|author=de Levie, Robert |author-link=Robert de Levie
|year=2004
|title=Advanced Excel for scientific data analysis
|publisher=Oxford University Press
|isbn=0-19-515275-1
|pages=45–46
}}
</ref><!-- These figures are simply screen shots of the listed arithmetic using Excel 2007 --> Excel's treatment of numbers beyond 15 significant figures sometimes contributes better accuracy to the final few significant figures of a computation than working directly with only 15 significant figures, and sometimes not.
For the reasoning behind the conversion to binary representation and back to decimal, and for more detail about accuracy in Excel and VBA consult these links.
name=accuracy_links Accuracy in Excel:
* {{cite web
|title=Floating point arithmetic may give inaccurate results
|id=KB 78113
|website=Microsoft support
|date=6 June 2024
|url=http://support.microsoft.com/kb/78113/en-us
}} — A detailed explanation with examples of the binary/15 sig fig storage consequences.
* {{cite web
|title=Why does Excel seem to give wrong answers?
|date=2008-04-10 |df=dmy-all
|series=Understanding floating point precision
|website=Microsoft Developers' Network
|type=blog
|url=http://blogs.msdn.com/excel/archive/2008/04/10/understanding-floating-point-precision-aka-why-does-excel-give-me-seemingly-wrong-answers.aspx
|archive-url=https://web.archive.org/web/20100330212040/http://blogs.msdn.com/excel/archive/2008/04/10/understanding-floating-point-precision-aka-why-does-excel-give-me-seemingly-wrong-answers.aspx
|archive-date=2010-03-30
}} — Another detailed discussion with examples and some fixes.
* {{cite journal
|author=Goldberg, David
|date=March 1991
|title=What every computer scientist should know about floating point
|journal=Computing Surveys
|doi=10.1145/103162.103163
|type=edited reprint
|id=E19957-01 / 806-3568
|via=Sun Microsystems
|url=http://docs.sun.com/source/806-3568/ncg_goldberg.html
|url-access=subscription
}} — Focuses upon examples of floating point representations of numbers.
* {{cite web
|title=Visual Basic and arithmetic precision
|id=Q279 / 7 / 55
|website=Microsoft support
|url=http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/Q279/7/55.ASP&NoWebContent=1
}} — Oriented toward VBA, which does things a bit differently.
* {{cite book
|author=Liengme, Bernard V.
|year=2008
|chapter=Mathematical limitations of Excel
|title=A guide to Microsoft Excel 2007 for scientists and engineers
|page=31 ff
|isbn=978-0-12-374623-8
|publisher=Academic Press
|chapter-url=https://books.google.com/books?id=0qDm7uuDmv0C&pg=PA31
|via=Google Books
}}
}}
1. The shortcomings in the
2. It is not only clean powers of two surviving, but any combination of values constructed of bits which will be within the 53
E.g. decimal 0
Needless to say, other spreadsheets have similar problems, LibreOffice Calc uses a more aggressive rounding, while gnumeric tries to keep precision and make as well the precision as the 'lack of' visible for the user.
Examples where precision is no indicator of accuracy{{Expand section|date=April 2010}}
Line 69 ⟶ 129:
[[File:Excel Std Dev Error.PNG|thumb|450px|Error in Excel 2007 calculation of standard deviation. All four columns have the same deviation of 0.5]]
Accuracy in Excel-provided functions can be an issue. [[Micah Altman|Altman]] ''et al
{{cite book
|author1=Altman, Micah |author1-link=Micah Altman
|author2=Gill, Jeff
|author3=McDonald, Michael
|year=2004
|chapter=§2.1.1 Revealing example: Computing the coefficient standard deviation
|title=Numerical Issues in Statistical Computing for the Social Scientist
|publisher=Wiley-IEEE
|isbn=0-471-23633-0
|chapter-url=https://books.google.com/books?id=j_KevqVO3zAC&pg=PA12
|page=12
}}
</ref> The population standard deviation given by:
Line 81 ⟶ 150:
:<math>\sqrt{ \frac{ n\Sigma x^2 - \left( \Sigma x \right) ^2 }{n^2} } \ .</math>
However, the first form keeps better numerical accuracy for large values of
Both the "compatibility" function {{code|STDEVP}} and the "consistency" function {{code|STDEV.P}} in Excel
===Subtraction of Subtraction Results===
Doing simple subtractions may lead to errors as two cells may display the same numeric value while storing two separate values.
An example of this occurs in a sheet where the following cells are set to the following numeric values:
:<math>A1 := 28.552</math>
:<math>A2 := 27.399</math>
:<math>A3 := 26.246</math>
and the following cells contain the following formulas
:<math>B1 : = A1 - A2</math>
:<math>B2 : = A2 - A3</math>
Both cells <math>B1</math> and <math>B2</math> display <math>1.1530</math>.
However, if cell <math>C1</math> contains the formula <math>B1 - B2</math>
Line 103 ⟶ 168:
but displays <math>-3.55271E-15</math> instead.
The above is not limited to subtractions, try {{code|{{=}} 1 + 1.405*2^(-48)}} in one cell, Excel rounds the display to 1,00000000000000000000, and {{code|{{=}} 0.9 + 225179982494413×2^(-51)}} in another, same display{{efn|
Rounding is different in the range above 1 vs. below 1, which impacts on most decimal or binary magnitude changes.
}}
above, different rounding for value and display, violates one of the elementary requirements in Goldberg (1991)<ref>
{{cite journal
|author=Goldberg, David
|date=March 1991
|title=What every computer scientist should know about floating point
|journal=Computing Surveys
|doi=10.1145/103162.103163
|type=edited reprint
|id=E19957-01 / 806-3568
|via=Sun Microsystems
|url=https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
|url-access=subscription
}} — more or less 'the holy book' of fp-math
</ref>
who states:
: ... 'it is important to make sure that its use is transparent to the user. For example, on a calculator, if the internal representation of a displayed value is not rounded to the same precision as the display, then the result of further operations will depend on the hidden digits and appear unpredictable to the user' ...
The problem is not limited to Excel, e.g. LibreOffice calc acts similarly.
===Round-off error===
Line 115 ⟶ 200:
It is possible to determine the round-off error by using the [[Taylor series]] formula for the square root:
<ref name=Zwillinger-Moll-etal-2014>
{{cite book
|first1=I.S. |last1=Gradshteyn |author1-link=Izrail Solomonovich Gradshteyn
|first2=I.M. |last2=Ryzhik |author2-link=Iosif Moiseevich Ryzhik
|first3=Yu.V. |last3=Geronimus |author3-link=Yuri Veniaminovich Geronimus
|first4=M.Yu. |last4=Tseytlin |author4-link=Michail Yulyevich Tseytlin
|first5=A. |last5=Jeffrey
|date=2015 |orig-year=October 2014
|chapter=1.112. Power series
|editor1-first=Daniel |editor1-last=Zwillinger
|editor2-first=Victor Hugo |editor2-last=Moll |editor2-link=Victor Hugo Moll
|title=Tables of Integrals, Series, and Products
|title-link=Gradshteyn and Ryzhik
|translator=Scripta Technica, Inc.
|publisher=[[Academic Press, Inc.]]
|edition=8
|language=En
|isbn=978-0-12-384933-5
|lccn=2014010276 <!-- |url=https://books.google.com/books?id=NjnLAwAAQBAJ
|access-date=2016-02-21-->
|page=25
}} {{ISBN|0-12-384933-0}}
</ref>
:<math>\sqrt{b^2-4ac} = b \ \sqrt{1-\frac{4ac}{b^2}} \approx b \left( 1 -\frac{2ac}{b^2} + \frac{2 a^2 c^2 }{b^4} + \cdots \right ).</math>
Line 130 ⟶ 237:
Under these circumstances, all the significant figures go into expressing ''b''. For example, if the precision is 15 figures, and these two numbers, ''b'' and the square root, are the same to 15 figures, the difference will be zero instead of the difference ε.
A better accuracy can be obtained from a different approach, outlined below.
name=Step_response This approximate method is used often in the design of feedback amplifiers, where the two roots represent the response times of the system. See the article on [[step response]].
}}
:<math>\left(x - r_1\right) \left( x - r_2 \right) = x^2 - \left( r_1 + r_2 \right) x + r_1 \ r_2 = 0.</math>
Line 152 ⟶ 259:
In the figure, Excel is used to find the smallest root of the quadratic equation ''x''<sup>2</sup> + ''bx'' + ''c'' = 0 for ''c'' = 4 and ''c'' = 4 × 10<sup>5</sup>. The difference between direct evaluation using the quadratic formula and the approximation described above for widely spaced roots is plotted ''vs.'' ''b''. Initially the difference between the methods declines because the widely spaced root method becomes more accurate at larger ''b''-values. However, beyond some ''b''-value the difference increases because the quadratic formula (good for smaller ''b''-values) becomes worse due to round-off, while the widely spaced root method (good for large ''b''-values) continues to improve. The point to switch methods is indicated by large dots, and is larger for larger ''c''-values. At large ''b''-values, the upward sloping curve is Excel's round-off error in the quadratic formula, whose erratic behavior causes the curves to squiggle.
A different field where accuracy is an issue is the area of [[Numerical integration|numerical computing of integrals]] and the [[Numerical ordinary differential equations|solution of differential equations]]. Examples are [[Simpson's rule]], the [[Runge–Kutta method]], and the Numerov algorithm for the [[Schrödinger equation]].<ref name=Blom-2002>
{{cite report
|author=Blom, Anders
|year=2002
|title=Computer algorithms for solving the Schrödinger and Poisson equations
|series=Department of Physics
|publisher=[[Lund University]]
|url=https://www.researchgate.net/publication/242226580
}}
</ref>
Using Visual Basic for Applications, any of these methods can be implemented in Excel. Numerical methods use a grid where functions are evaluated. The functions may be interpolated between grid points or extrapolated to locate adjacent grid points. These formulas involve comparisons of adjacent values. If the grid is spaced very finely, round-off error will occur, and the less the precision used, the worse the round-off error. If spaced widely, accuracy will suffer. If the numerical procedure is thought of as a [[Negative feedback amplifier|feedback system]], this calculation noise may be viewed as a signal that is applied to the system, which will lead to instability unless the system is carefully designed.<ref name=Hamming-1986>
{{cite book
|author=Hamming, R.W. |author-link=Richard Hamming
|year=1986
|chapter=Chapter 21 – Indefinite integrals – feedback
|title=Numerical Methods for Scientists and Engineers
|edition=2nd
|publisher=Courier Dover Publications
|isbn=0-486-65241-6
|page=357
|chapter-url=https://books.google.com/books?id=Y3YSCmWBVwoC&pg=PA357
|url=https://archive.org/details/numericalmethods00hamm_0
|url-access=registration
}} — This book discusses round-off, truncation and stability extensively. For example, see chapter 21, page 357.
</ref>
===Accuracy within VBA===
Although Excel nominally works with [[byte|8-byte]] numbers by default, [[Visual Basic for Applications|VBA]] has a variety of data types. The ''Double'' data type is 8 bytes, the ''Integer'' data type is 2 bytes, and the general purpose 16 byte ''Variant'' data type can be converted to a 12 byte ''Decimal'' data type using the VBA conversion function ''CDec''.<ref name=
{{cite book
|author=Walkenbach, John
|year=2010
|chapter=Defining data types
|title=Excel 2010 Power Programming with VBA
|pages=198 ff & Table 8-1
|isbn=978-0-470-47535-5
|publisher=Wiley
|chapter-url=https://books.google.com/books?id=dtSdrjjVXrwC&pg=PA198
}}
</ref>
Choice of variable types in a VBA calculation involves consideration of storage requirements, accuracy and speed.
==Footnotes==
|