Content deleted Content added
Line 11:
To give an example, a common way to use [[Arbitrary-precision arithmetic|integer arithmetic]] to simulate floating point, using 32 bit numbers, is to multiply the coefficients by 65536.
Using [[binary scientific notation]], this will place the binary point at B16. That is to say there are 16 binary integer bits and the remainder are fractional. This means, as a signed two's complement integer B16 number can hold a highest value of <math> \approx 32767.999 </math> and a lowest value of <math> -32768.0</math> . Put another way, the B number, is the number of integer bits used to represent the number which defines its value range. Remaining
For instance, to represent 1.2 and 5.6 floating point real numbers as B16 one multiplies them by 2<sup>16</sup>, giving 78643 and 367001.
|