Content deleted Content added
Line 11:
To give an example, a common way to use [[Arbitrary-precision arithmetic|integer arithmetic]] to simulate floating point, using 32 bit numbers, is to multiply the coefficients by 65536.
Using [[binary scientific notation]], this will place the binary point at B16. That is to say there are 16 binary integer bits and the remainder are fractional. This means, as a signed two's complement integer B16 number can hold a highest value of <math> \approx 32767.999 </math> and a lowest value of <math> -32768.0</math> .
For instance, to represent 1.2 and 5.6 floating point real numbers as B16 one multiplies them by 2<sup>16</sup>, giving 78643 and 367001.
|