Content deleted Content added
→top: ce |
→Overview: ce; still needs rewrite |
||
Line 3:
==Overview==
A representation of a floating point value using binary scaling is more precise than a floating point representation occupying the same number of bits, but cannot represent values beyond the range that it represents, thus more easily leading to [[arithmetic overflow]] during computation. Implementation of operations using integer arithmetic instructions is often (but not always) faster than the corresponding floating point instructions.
A position for the
To give an example, a common way to use [[Arbitrary-precision arithmetic|integer arithmetic]] to simulate floating point, using 32 bit numbers, is to multiply the coefficients by 65536.
Using [[binary scientific notation]], this will place the binary point at B16. That is to say,
For instance, to represent 1.2 and 5.6
Multiplying these together gives
Line 21 ⟶ 19:
To convert it back to B16, divide it by 2<sup>16</sup>.
This gives 440400B16, which when converted back to a floating point number (by dividing again by 2<sup>16</sup>, but holding the result as floating point) gives 6.71999. The correct floating point result is 6.72.
==Re-scaling after multiplication==
|