Floating-point arithmetic: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile web edit Advanced mobile edit
Internal representation: update concerning the implicit bit
Line 177:
 
=== Internal representation ===
Floating-point numbers are typically packed into a computer datum as the sign bit, the exponent field, and thea significandfield orfor mantissathe significand, from left to right. For the [[IEEE 754]] binary formats (basic and extended) that have extant hardware implementations, they are apportioned as follows:
 
{| class="wikitable" style="text-align:right; border:0"
Line 250:
While the exponent can be positive or negative, in binary formats it is stored as an unsigned number that has a fixed "bias" added to it. Values of all 0s in this field are reserved for the zeros and [[subnormal number]]s; values of all 1s are reserved for the infinities and NaNs. The exponent range for normal numbers is [−126, 127] for single precision, [−1022, 1023] for double, or [−16382, 16383] for quad. Normal numbers exclude subnormal values, zeros, infinities, and NaNs.
 
In the IEEE binary interchange formats the leading bit of a normalized significand is not actually stored in the computer datum, since it is always 1. It is called the "hidden" or "implicit" bit. Because of this, the single-precision format actually has a significand with 24 bits of precision, the double-precision format has 53, and quad has 113, and octuple has 237.
 
For example, it was shown above that π, rounded to 24 bits of precision, has: