Floating-point arithmetic: Difference between revisions

Content deleted Content added
m fixed paragraph formatting, punctuation, and spelling
m removed parenthesis, fixed formatting
Line 12:
This scheme allows a large range of magnitudes to be represented within a given size of field, which is not possible in a [[fixed-point]] notation.
 
As an example, a floating-point number with four decimal digits (''b'' = 10, ''p'' = 4) and an exponent range of ±4 could be used to represent 43210, 4.321, or 0.0004321, but would not have enough precision to represent 432.123 and 43212.3 (which would have to be rounded to 432.1 and 43210). Of course, in practice, the number of digits is usually larger than four.
 
In addition, floating-point representations often include the special values +∞, −∞ (positive and negative infinity), and [[NaN]] ('Not a Number'). Infinities are used when results are too large to be represented, and NaNs indicate an invalid operation or undefined result.
 
=== Hidden bit ===
When using binary (''b'' = 2), one bit can be saved if all numbers are required to be normalized. The leading digit of the significand of a normalized binary
floating-point number is always non-zero; in particular it is always 1. This means that it does not need to be stored explicitly, for a normalized number it can be understood to be 1.
 
Line 24:
 
== Usage in [[computing]] ==
While in the examples above the numbers are represented in the [[decimal]] system (that is the base of numeration, ''b'' = 10), computers usually do so in the [[binary_numeral_system|binary]] system, which means that ''b'' = 2). In computers, floating-point numbers are sized by the number of [[bit|bits]] used to store them. This size is usually 32 bits or 64 bits, often called "single-precision" and "double-precision". A few machines offer larger sizes; Intel [[FPU|FPUs]] such as the [[Intel 8087]] (and its descendants integrated into the [[x86]] architecture) offer 80 bit floating point numbers for intermediate results, and several systems offer 128 bit floating-point, generally implemented in software.
 
== Problems with floating-point ==