Revision as of 11:11, 21 February 2018 edit Robin48gx (talk \| contribs) 134 edits m →first example uses 32 bit numbers to hold values but this was not made explicit ← Previous edit		Revision as of 11:30, 21 February 2018 edit undo Robin48gx (talk \| contribs) 134 edits →Overview Next edit →
Line 11: To give an example, a common way to use [[Arbitrary-precision arithmetic\|integer arithmetic]] to simulate floating point, using 32 bit numbers, is to multiply the coefficients by 65536. Using [[binary scientific notation]], this will place the binary point at B16. That is to say there are 16 binary integer bits and the remainder are fractional. This means, as a signed two's complement integer B16 number can hold a highest value of <math> \approx 32767.999 </math> and a lowest value of <math> -32768.0</math> . For instance, to represent 1.2 and 5.6 floating point real numbers as B16 one multiplies them by 2<sup>16</sup>, giving 78643 and 367001.

Binary scaling: Difference between revisions