Talk:Floating-point arithmetic/Archive 4: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 07:00, 15 February 2012 edit MiszaBot I (talk \| contribs) 234,552 edits m Archiving 5 thread(s) from Talk:Floating point. ← Previous edit		Latest revision as of 20:21, 9 August 2017 edit undo Deacon Vorbis (talk \| contribs) Extended confirmed users, Rollbackers 23,589 edits m Deacon Vorbis moved page Talk:Floating point/Archive 4 to Talk:Floating-point arithmetic/Archive 4: Talk archive wasn't moved with rest of page
(12 intermediate revisions by 2 users not shown)
Line 452: :I suppose you put a link to [[GNU Multi-Precision Library]] in the see also but it doesn't seem to me to warrant anything in the main article. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 20:51, 8 January 2011 (UTC) == Alternatives to the FP representation == There should be a mentioning of [[continued fractions]] in that section. Software implementations of it [http://blog.poucet.org/2008/02/continued-fractions-in-haskell/ already exist], and it has incredible properties, especially together with lazily-evaluated languages like Haskell. [[User:Whitehorses2501\|Whitehorses2501]] ([[User talk:Whitehorses2501\|talk]]) 00:15, 29 April 2011 (UTC) :Only if we had some notability and a reliable source. That thing you pointed at was a blog of someone's efforts and they haven't even figured out how to multiply the square root of 2 by itself yet. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 08:18, 29 April 2011 (UTC) == formula to calculate Range of floating-point numbers == The '''Range of floating-point numbers''' section says {{quotation\|Positive floating-point numbers in this format have an approximate range of 10<sup>−308</sup> to 10<sup>308</sup> (because 308 is approximately 1023 × log<sub>10</sub>(2), since the range of the exponent is [−1022,1023]). The complete range of the format is from about −10<sup>308</sup> through +10<sup>308</sup> (see [[IEEE 754]]).}} Would this be more clear if expressed as: {{quotation\|... because 308 is approximately log<sub>10</sub>(2<sup>1023</sup>) ...}} The latter is more consistent with the earlier (in Overview) notation of ''value'' = ''s'' × ''b<sup>e</sup>'' (where ''b=2'', ''e''=1023). [[Special:Contributions/63.116.23.136\|63.116.23.136]] ([[User talk:63.116.23.136\|talk]]) 05:10, 1 July 2011 (UTC) {{done\|Done [[User:Mitch Ames\|Mitch Ames]] ([[User talk:Mitch Ames\|talk]]) 02:29, 31 July 2011 (UTC)}} == Unnecesary precision == As requested after someone stuck in loads of extra digits of pi I have set up a section in this talk page for discussion if somebody else thinks loads of digits which whave nothing to do with the topic are a good idea. Until then the consensus of people on the matter from the history is pretty apparent and a big long uninteresting and irrelevant string of digits should not be put in. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 17:35, 5 October 2011 (UTC) :In addition, everyone involved needs to read and follow [[Wikipedia:Edit warring]] I have placed warnings on the userpages of everyone who is at 2RR. :Derek farn, the consensus is against you on this one. Davidhorman, Dmcq and Guy Macon all agree that going ten digits past the number of digits in the single precision example is enough to get the point across, and that more digits than that detract from the article. [[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 21:19, 5 October 2011 (UTC) ::Well perhaps you could explain why you want so many digits in yourself. Having a load of unnecessary digits just encourages people to add extra ones as far as I can see as a kind of pointy comment on the length. Why did you want some digits in grey past the first seven in bold and do we really need thirty digits to see the difference? [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 23:45, 5 October 2011 (UTC) [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 23:45, 5 October 2011 (UTC) == Software of book side of history == I just reverted a bit about the Pilot Ace in history because it used software to emulate floating point. However it occrs to me that there might be something worthwhile in the bit about J.H.Wilkinson, ''Rounding errors in algebraic processes''. Is there evidence about who wrote a book about floating point or that this was a particular turning point? [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 18:46, 6 February 2012 (UTC) == IEEE 754 == I have added a section discussing the "big picture" on the rationale and use for the IEEE 754 features which often gets lost when discussing the details. I plan to add specific references for the points made there (from Kahn's web site). It would be good to expand the examples and add additional ones as well. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures\|undated]] comment added 11:22, 19 February 2012 (UTC).</span><!--Template:Undated--> <!--Autosigned by SineBot--> :You need to cite something saying these were accepted rationales for it. Citations point to specific books journals or newspapers and preferably page number ranges. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 13:51, 19 February 2012 (UTC) Added direct citations as requested. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures\|undated]] comment added 18:20, 19 February 2012 (UTC).</span><!--Template:Undated--> <!--Autosigned by SineBot--> :Thanks. My feeling about Kahan and his diatribe against Java is that he just doesn't get what programmers have to do when testing a program. Having a switch to enable lax typing of intermediate results where you know it ill only be run in environments you've tested is a good idea but that wasn't what Java was originally designed for. The section about extended precision there seems undue in length as I'm pretty certain other considerations like signed zero and denormal handling were the main original considerations where it differed from previous implementations. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 20:37, 19 February 2012 (UTC) :Although I referenced Kahan's Java paper several times, I certainly didn't want this section to appear as a slight against Java. Kahan has several other papers discussing the need for extended precision that do not mention Java-- I will replace the current references with those in the near future, and try to trim it down (although I don't think that that reference is a diatribe against Java, just against its numerics). I certainly didn't want to get into the tradeoffs between improved numerical precision of results versus exact reproducibility in Java in this section. I do however think that it is important to clarify the intended use of the IEEE754 features in an introductory article like this, which can get lost in detailed descriptions of the features. In particular, I find that there is wide misunderstanding of the intended use of, and need for, extended precision amongst the programming community, particularly as extended precision was historically not supported in several RISC processors, and thus it is underused by programmers, even when targeting the x86 platform for e.g. HPC (even when these same programmers would carry additional significant figures for intermediate calculations if doing the same computations by hand, as alluded to in this section). Also, Kahan's descriptions of work on the design of the x87 (based on his experience designing HP calculators which use extended precision internally) makes it clear that extended precision was intended as a key feature (indeed a recommended feature) of IEEE754, compared with previous implementations. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 00:56, 20 February 2012 (UTC) :As far as I'm aware the main other rationales were ::To have a sound mathematical basis in that results were correctly rounded versions of accurate results and also so reasoning about the calculations would be easier. ::Round to even was used to improve accuracy. In fact this is much more important than extended precision if the double storage mode is only used for intermediate calculations. Using extended precision only gives bout one extra bit overall at the end if values in arrays are in doubles. The main reason I believe they were put in was it made calculating mathematical functions much easier and more accurate, they can also be used in inner routines with benefit. ::Biased rounding was put in I believe to support interval arithmetic - another part of being able to guarantee the results of calculations. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 15:43, 20 February 2012 (UTC) :::''Using extended precision only gives bout one extra bit overall at the end if values in arrays are in doubles''. This is false in general; you must be thinking of some special cases where not many intermediate calculations happen before rounding to double for storage. For a counterexample, e.g. consider a loop to take a dot product of two double-precision arrays (not using Kahan summation etc.) [[User:Stevenj\|— Steven G. Johnson]] ([[User talk:Stevenj\|talk]]) 21:16, 20 February 2012 (UTC) ::::You would normally get very little advantage in that case over round to even with so few intermediate calculations. And for longer calculations round to even wins over just using a longer mantissa and rounding down. You only get a worthwhile gain if the storage is in extended precision. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 21:53, 20 February 2012 (UTC) :::::That is certainly not the case in general. The examples you are thinking of are using simple exactly rounded single arithmetic expresions-- the advantage of extended precision is avoiding loss of precision in more complicated numerically unstable formulae-- e.g. it is easy to construct examples were even computing a quadratic formula discriminant can cause massive loss of ULP when computed in double but not in double extended. Several examples are given in the Kahan references. This is in addition to the advantage of the extended exponent in avoiding overflow in e.g. dot products. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 00:16, 22 February 2012 (UTC) :::When you say ''Round to even was used to improve accuracy.'', I take it you are mainly referring to the exact rounding: breaking ties by round to even does avoid some additional statistic biases but it is rather subtle (might be worth mentioning the main text though..). [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 00:16, 22 February 2012 (UTC) ::: ''Biased rounding was put in I believe to support interval arithmetic''. Yes, I believe directed rounding was included to support interval arithmetic, but also for debugging numerical stability issues-- if an algorithm gives drastically different results under round to + and - infinity then it is likely unstable. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 00:16, 22 February 2012 (UTC) ::: ''As far as I'm aware the main other rationales were... to have a sound mathematical basis in that results were correctly rounded versions of accurate results and also so reasoning about the calculations would be easier.''. Yes, the exact rounding is an important point-- I have added some additional text earlier in the article to expand on this. It is true that, like previous arithmetics, having a precise specification to allow expert numerical analysts to write robust libraries was an important consideration, but the unique aspect of IEEE-754 is that it was also aimed at a broad market of non-expert users and so I focused in the section on the robustness features relevant to that (I will add some text highlighting that aspect as well though). [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 00:16, 22 February 2012 (UTC) ::::Well exact rounding, but I thought it better to specify the precise format they have. The point is that rounding rather than truncating is what really matters. With rounding the error only tends to go up with the number of computations as the square root of the number of operations whereas with directed rounding it goes up linearly. Even the reduction of bias by round to even matter in this. You alwayts get something else putting in a little bias so it is not as good as this but directed rounding is really bad. You're better off just perturbing the original figures for stability checking. ::::The mathematical basis makes it much easier to do things like construct longer precision arithmetic packages easily, in fact the fused multiply is particularly useful for this. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 00:27, 22 February 2012 (UTC) :::::The use of directed rounding for diagnosis of stability issues is discussed here http://www.cs.berkeley.edu/~wkahan/Stnfrd50.pdf and in other references at that web site. It also discusses why perturbation alone is not as useful. IEEE 754-2008 annex B states this explicitly-- "B.2 Numerical sensitivity: Debuggers should be able to alter the attributes governing handling of rounding or exceptions inside subprograms, even if the source code for those subprograms is not available; dynamic modes might be used for this purpose. For instance, changing the rounding direction or precision during execution might help identify subprograms that are unusually sensitive to rounding, whether due to ill-condition of the problem being solved, instability in the algorithm chosen, or an algorithm designed to work in only one rounding- direction attribute. The ultimate goal is to determine responsibility for numerical misbehavior, especially in separately-compiled subprograms. The chosen means to achieve this ultimate goal is to facilitate the production of small reproducible test cases that elicit unexpected behavior." [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 01:04, 22 February 2012 (UTC) ::::::The uses that somebody makes of features is quite a different thing from the rationale for why somebody would pay to have them implemented. The introduction to the standard gives a succinct summary of the main reasons for the standard. I'll just copy the latest here so you can see :a) Facilitate movement of existing programs from diverse computers to those that adhere to this standard as well as among those that adhere to this standard. :b) Enhance the capabilities and safety available to users and programmers who, although not expert in numerical methods, might well be attempting to produce numerically sophisticated programs. :c) Encourage experts to develop and distribute robust and efficient numerical programs that are portable, by way of minor editing and recompilation, onto any computer that conforms to this standard and possesses adequate capacity. Together with language controls it should be possible to write programs that produce identical results on all conforming systems. :d) Provide direct support for ::― execution-time diagnosis of anomalies ::― smoother handling of exceptions ::― interval arithmetic at a reasonable cost. :e) Provide for development of ::― standard elementary functions such as exp and cos ::― high precision (multiword) arithmetic ::― coupled numerical and symbolic algebraic computation. :f) Enable rather than preclude further refinements and extensions. ::::::There are other things but this is what the basic rationale was and is. Directed rounding was for interval arithmetic. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 01:56, 22 February 2012 (UTC) :::::::Thanks. Actually, I believe that "d) Provide direct support for― execution-time diagnosis of anomalies" is referring to this use of directed rounding to diagnose numerical instability. Certainly Kahan makes it clear that he considered it a key usage from the early design of the x87. I agree that its use for interval arithmetic was also considered from the beginning. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 02:11, 22 February 2012 (UTC) ::::::::No that refers to identification and methods of notifying the various exceptions and the handling of the signalling and quiet NaNs. Your reference from 2007 does not support in any way that arbitrarily jiggling the calculations using directed rounding was considered as a reason to include directed rounding in the specification. He'd have been just laughed at if he had justified spending money on the 8087 for such a purpose when there are easy ways of doing something like that without any hardware assistance. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 08:23, 22 February 2012 (UTC) == Trivia removed == I removed about that the full precision of extended precision is attained when extended precision is used. The point about the algorithm is it converges using the precision used. We don't need to put in the precisions of single double and extended precision versions of the algorithm. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 23:23, 23 February 2012 (UTC) :::I disagree that it is trivia-- it is a good example to also illustrate the earlier discussions on the usage of extended precision. In any case, to make it easier to find for those who may be interested in the information: the footnote to the final example, giving the precision using double extended for internal calculations, is included here- :::"As the recurrence is applied repeatedly, the accuracy improves at first, but then it deteriorates. It never gets better than about 8 digits, even though 53-bit arithmetic should be capable of about 16 digits of precision. When the second form of the recurrence is used, the value converges to 15 digits of precision. Footnote: if intermediate calculations are carried at a higher precision using double extended (x87 80 bit) format, it reaches 18 digits of precision, which is the full target double precision." [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 23:37, 23 February 2012 (UTC) :It just has nothing to do with extended precision. The first algorithm would go wrong just as badly with extended precision and the second one behaves exactly like double. There is nothing of note here. Why should it have all the various precisons in? The same thing would happen with float or quad precision. All it says is that the precision for different orecisions is different. Also a double cannot hold 18 digits of precision, used as an intermediate for double you'd at most get one bit of precision extra. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 00:50, 25 February 2012 (UTC) ::::Agreed that the footnote does nothing to clarify the particular point being made by that example-- that wasn't the aim though. The intention was to also utilise the example to demonstrate the utility of computing intermediate values to higher precision than needed by the final destination format to limit the effects of round-off. In that sense it is an example for the earlier discussion on extended precision (and also the section of approaches to improve accuracy). Perhaps the text "Footnote: if intermediate calculations are carried at a higher precision using double extended (x87 80 bit) format, it reaches 18 digits of precision, which is the full target double precision (see discussion on extended precision above)." would be clearer. Agreed it is is not the most striking example of this, but still demonstrates the idea-- perhaps a separate, more striking and specific example would be preferable, I will see what I can find. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 04:52, 25 February 2012 (UTC) :::::It does not illustrate that. What give you the idea it does? If anything it is an argument against what was said before. Using extended precision in the intermediate calculation and storing back as double does not give increased precision in the final result. The 18 digits only applies to the extended precision, it does not apply to the double result. The 18 digits is not the target precision of a double. A double can only hold 15 digits accurately. There is no way to stick the extra precision of the extended precision into the target double. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 09:53, 25 February 2012 (UTC) ::::::IEEE 754 double precision gives from 15 to 17 decimal digits of precision (17 digits if round-tripping from double to text back to double). When the example is computed with extended precision it gives 17 decimal digits of precision, so if the returned double was to be used for further computation it would have less roundoff error, in ULP (at least one extra decimal digit worth). Although, as you say, if the double result is printed to 15 decimal digits this extra precision will be lost. I agree that it is not a compelling example-- a better example could show a difference in many decimal significant digits due to internal extended precision. [[Special:Contributions/121.45.205.130\|121.45.205.130]] ([[User talk:121.45.205.130\|talk]]) 23:21, 25 February 2012 (UTC) :::::::The 17 digits for a round trip is only needed to cope with making certain that rounding works okay. The actual precision is just less than 16 digits, about 15.95 if one cranks the figures. Printing has nothing to do with it. I was just talking about the 53 bits of precision information held within double precision format expressed as decimal digits. You can't shove any more information into the bits. The value there is about 1 ulp out and using extended precision would gain that back. This is what I was saying about extended precision being very useful for getting accurate maths functions, straightforward implementations in double will very often be 1 ulp out without special work whereas the extended precision result will very often give the value given by rounding the exact value. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 00:08, 26 February 2012 (UTC) ::::::::::Ideally, what should be added is a more striking example of using excess precision in intermediate computations to protect against numerical instability. The current one can indeed demonstrate this if excess precision is carried to IEEE quad precision, in which case the numerical unstable version gives good results. I have added notes to that effect which will do as an example for now. There are many examples also showing this using only double extended (e.g. even as simple as computing the roots of a quadratic equation), and I will add such an example in the future.. but not for a while (by the way, I think double extended adds more than 1 ULP but I haven't checked that). [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 06:54, 26 February 2012 (UTC) :::::::::::That's not true either because how does one know when to stop? Using quadruple precision would still diverge. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 11:45, 26 February 2012 (UTC) ::::::::::::::::Yes that is so- once it does reach the correct value it stays there for several iterations (at double precision) but does eventually diverge from it again, so a stopping criterion of when the value does not change at double precision could be used. But yes, I am not completely happy with that example for that reason-- feel free to remove it if you feel it is misleading. Actually Kahan has several very compelling examples in his notes-- I will post one here in the next week or so. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 14:41, 26 February 2012 (UTC) The use of extra precision can be illustrated easily using differentiation. If the result is to be single precision then using double precision for all the calculations is a good idea because of th loss of significance when subtracting two values of he function. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 12:00, 26 February 2012 (UTC) ::: ok yes, that could be a good example-- I will see what I can come up with. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 14:41, 26 February 2012 (UTC) : If have added an example from Kahan's publications-- I think this is a good example as it demonstrates the massive roundoff error (up to half signif. digits lost) that can occur with even innocuous-looking formulae, and shows the two main methods to correct or improve that: increased internal precision, or numerical analysis. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 07:03, 28 February 2012 (UTC) ::Yes it is definitely better to source something like that to a good source like him. I may not agree with every last word he says about it but he definitely is the premiere source for anything on floating point. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 14:14, 28 February 2012 (UTC) == 01010111 01101000 01100001 01110100 00101110 00101110 00101110 00111111 (What...?) == The section on internal representation does not explain how decimals are converted to floating-point values. I think it will be helpful if we add a step-by-step procedure that the computer follows. Thanks! [[Special:Contributions/68.173.113.106\|68.173.113.106]] ([[User talk:68.173.113.106\|talk]]) 02:16, 25 February 2012 (UTC) :This gives an example of conversion and the articles on the particular formats give other examples. Wikipedia does not in general provide step by step procedures, it describes things, see [[WP:NOTHOWTO]]. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 02:24, 25 February 2012 (UTC) ::I just thought it was kind of unclear. Besides, doing so might actually help this article get to GA status. ::You see, I'm trying to design an algorithm for getting the mantissa, the exponent, and the sign of a <code>float</code> or <code>double</code>. So in case anyone else actually cares about that stuff. For the record, the storage is little-endian, so you have to reverse the bit order. [[Special:Contributions/68.173.113.106\|68.173.113.106]] ([[User talk:68.173.113.106\|talk]]) 02:50, 25 February 2012 (UTC) :::It would stop FA status. Have a look at the articles about the individual formats. They describe in quite enough details the format. Any particular algorithm is up to the user, they are not interesting or discussed in secondary sources. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 10:01, 25 February 2012 (UTC) :::The closest in Wikipedia for the sort of stuff you're talking about is if somebody wrote something for wikibooks. Have you had a look at the various external sites? Really to me what you're talking about sounds like some homework exercise and we shouldn't help with those except perhaps to give hints. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 10:20, 25 February 2012 (UTC) == imho, "real numbers" is didactically misleading == I'd like to propose to change the beginning of the first sentence, because the limited amount of bits in the significand only allows for storing rational binary numbers. Because two is a prime factor of ten, this means only rational decimal numbers can be stored as well. Concluding, I'd like to propose to replace "real" by "rational" there. [[User:Drgst\|Drgst]] ([[User talk:Drgst\|talk]]) 13:17, 25 February 2012 (UTC) :Definitely not. That is a bad idea. They are approximations to real numbers. The concept of rational number just doesn't come into it. That they are rational is just a side effect. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 14:32, 25 February 2012 (UTC) ::In the section 'Some other computer representations for non-integral numbers' there are some systems that can represent some irrational numbers. for instance a logarithmic system does not necessarily represent rational numbers. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 14:36, 25 February 2012 (UTC) :::Sorry for the delayed answer, Dmcq, it seems I forgot to tick the "watch page" checkbox... now for the content: IEEE FP numbers definitely are rational numbers. Even the most simple irrational number in the world, i.e. sqrt(2), cannot be represented, e.g. Any mathematical theorem that really depends on the existence of irrational numbers does not hold for the set of FP numbers. Nevertheless, you are right in stating that FP numbers are meant to approximate real numbers. Yet, as no non-rational number can be represented, transcendental numbers are far from being representable. Of course, this has serious consequences: for example, none of these nice trigonometric identities involving pi or pi/2 can be used naively without introducing large errors. This is just a simple example of why I think people should be warned of associating floating point numbers with real numbers.[[User:Drgst\|Drgst]] ([[User talk:Drgst\|talk]]) 21:14, 27 June 2012 (UTC) ::::"Irrational numbers are those real numbers that cannot be represented as terminating or repeating decimals." --[[Irrational number]] Therefore, irrational numbers ''cannot be exactly represented on any digital computer''. However, you can get arbitrarily close. It really doesn't take all that many bits to handle a Planck length (~10^-35m) and the estimated size of the universe (~10^26m) in the same calculation. ::::The key point here is that floating point really is a method of representing (not perfectly but arbitrarily close) real numbers. Yes, it just so happens that some of them are represented exactly and others are not, but that's not relevant to the fact that FP is a method of representing (imperfectly) real numbers. All of this is covered quite nicely in the "Representable numbers, conversion and rounding" section. No need to make the lead confusing and misleading. --[[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 22:48, 27 June 2012 (UTC) :::::I don't think this is correct "floating point really is a method of representing (not perfectly but arbitrarily close) real numbers". We talk about the "representable numbers" as those real numbers which can be represented exactly within the system. Other real numbers are rounded to some representable number. So I think we should either speak in terms of "working with real numbers" (which seems a little vague) or "representing approximations to real numbers" (as we do later in the article). --[[User:JakeVortex\|Jake]] ([[User talk:JakeVortex\|talk]]) 08:50, 22 October 2012 (UTC) ::::::You make a good point, but while "working with real numbers" is inexact and vague, "representing approximations to real numbers" is wordy and clumsy. Perhaps we can devise a third alternative? --[[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 12:57, 22 October 2012 (UTC) :::::::What about "approximating real numbers"? But IMHO, "real numbers" is slightly incorrect, because floating point can also be used for complex arithmetic (though a complex number is here seen as a pair of two real numbers). Moreover a floating-point arithmetic is not just about the representation, but also the behavior when doing an operation (e.g. how the result is rounded). So, I would prefer something like: "a method of doing numerical computations" [[User:Vincent Lefèvre\|Vincent Lefèvre]] ([[User talk:Vincent Lefèvre\|talk]]) 22:09, 22 October 2012 (UTC) == Guard bits == Anybody know where the business of needing three extra bits comes from? For addition one only needs a guard/round digit plus a sticky bit as the sticky bit will always be zero if subtraction means you have to shift up. And for multiplication one needs the double length to cope with carry properly before rounding - but one can still cut that down to two bits before applying the particular rounding. The literaure talks about guard and round and sticky so I'm not disputig putting it in the text, just wondering why people got the idea in their heads in the first place. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 13:03, 8 March 2012 (UTC) :Somewhat related: Take a look at "2 vs 3 guard bits" here: :http://www.engineering.uiowa.edu/~carch/lectures07/55035-070404-prn.pdf :Also interesting: :http://www.google.com/patents/US4282582.pdf :These two searches turn up some interesting pages: :[http://www.google.com/search?q=%22floating+point%22+%2240+bits%22 <nowiki>http://www.google.com/search?q="floating+point"+"40+bits"</nowiki>] :[http://www.google.com/search?q=%22floating+point%22+%22eight+guard+bits%22+%22DSP%22 <nowiki>http://www.google.com/search?q="floating+point"+"eight+guard+bits"+"DSP"</nowiki>] :--[[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 00:39, 9 March 2012 (UTC) ::Goldberg gives a discussion of the need for two guard digits in http://www.validlab.com/goldberg/paper.pdf (page 195). There is a very clear description with example cases in: Michael L. Overton (2001). Numerical Computing with IEEE Floating Point Arithmetic. SIAM. [[User:Brianbjparker\|Brianbjparker]] ([[User talk:Brianbjparker\|talk]]) 06:17, 9 March 2012 (UTC) :::Very good reference. It should be noted that he not only covers base 10 and guard (decimal) digits but also base 2 and guard bits. --[[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 07:02, 9 March 2012 (UTC) :::I just looked at some implementation I did of the whole business I did ages ago and I did actually use three bits! Just me forgetting what I'd done, sorry. yes the subtraction does actually require them all. [[User:Dmcq\|Dmcq]] ([[User talk:Dmcq\|talk]]) 11:33, 9 March 2012 (UTC) == edit : computation in page is correct after all == Sorry for the confusion : I used t_(i+1) instead of t_i. for that reason I missed a factor 2 : 2^(i+1) = 2 * 2^i. <small><span class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[User:KeesLem\|KeesLem]] ([[User talk:KeesLem\|talk]] • [[Special:Contributions/KeesLem\|contribs]]) 14:36, 21 February 2013 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot--> == Justification for division by zero definition == I [http://en.wikipedia.org/w/index.php?title=Division_by_zero&diff=511812597&oldid=510158610 recently added] to [[division by zero]] this statement with an appropriate source: :"The justification for this definition is to preserve the sign of the result in case of [[arithmetic underflow]]. For example, in the double-precision computation 1/(''x''/2), where ''x'' = ±2<sup>−149</sup>, the computation ''x''/2 underflows and produces ±0 with sign matching ''x'', and the result will be ±∞ with sign matching ''x''. The sign will match that of the exact result ±2<sup>150</sup>, but the magnitude of the exact result is too large to represent, so infinity is used to indicate overflow." Provided this is valid, I wonder if it could also be added in some relevant ___location in the body of floating point related articles. In general I'd like to see more information on design rationales. Thanks! [[User:Dcoetzee\|Dcoetzee]] 07:42, 11 September 2012 (UTC) == Signed zero section, branch cuts == The section on signed zero (under Internal representation >> Special values >> Signed zero) says the following: "The difference between +0 and −0 is mostly noticeable for complex operations at so-called [[Branch cut\|branch cuts]]." In a strictly mathematical sense, +0/-0 ''can'' be interpreted as describing the limiting behaviors of a function, but that's not actually what's happening here. Moreover, branch cuts are not the only situation where these exceptional limiting behaviors appear, one can have branch cuts without exceptional limiting behaviors of this sort, and none of the examples given in the section are actually branch cuts. As far as I can tell, there is absolutely no significance to the relationship between branch cuts in complex analysis and signed zero in floating point numerical representations, but I wanted to make sure there wasn't a good reason for this being here. Thoughts? [[Special:Contributions/71.227.119.236\|71.227.119.236]] ([[User talk:71.227.119.236\|talk]]) 15:25, 29 September 2012 (UTC) :Result of a quick Google search: :"A system with signed zero can distinguish between asin(5+0i) and asin(5-0i) and pick the appropriate branch cut continuous with quadrant I or quadrant IV, respectively. A system without signed zero cannot distinguish and, according to the choses the branch cut such that it is continuous with quadrant IV (consistent with the rule of CCC). So, for asin(5+0i) it will return the same value as a system with signed zero would for asin(5-0i)." -Richard B. Kreckel ( [ http://www.ginac.de/~kreckel/ ] [ http://lists.gnu.org/archive/html/bug-gsl/2011-12/msg00004.html ] ). :I think that when he wrote "according to the" he meant "accordingly" (probably not a native English speaker). --[[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 23:34, 29 September 2012 (UTC) ::Somewhat straying from the subject but still quite interesting; the "Signed Zero" section of "What Every Computer Scientist Should Know About Floating-Point Arithmetic" ( [ http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html ] ) --[[User:Guy Macon\|Guy Macon]] ([[User talk:Guy Macon\|talk]]) 23:41, 29 September 2012 (UTC) == imho, the computation for Pi as shown actually computes only Pi/2 == The algorithm as shown to compute an approximation of Pi actually computes imo in this form only Pi/2, even while the output shown contains an approximation for Pi. I think either the values should be halved or the formula should be changed into : 12 * 2^i * t_i [[User:KeesLem\|KeesLem]] ([[User talk:KeesLem\|talk]]) 15:16, 21 February 2013 (UTC) <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/130.161.210.156\|130.161.210.156]] ([[User talk:130.161.210.156\|talk]]) </span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->