Binary scaling: Difference between revisions

Content deleted Content added
stop saying floating point if you are trying to say REAL NUMBERS. the point does not even float!
Overview: simplification ("non-integer point value" is meaningless).
Line 4:
 
==Overview==
A representation of a non-integer point value using binary scaling is more precise than a floating -point representation occupying the same number of bits, but typically represents values of a more limited range, therefore more easily leading to [[arithmetic overflow]] during computation. Implementation of operations using integer arithmetic instructions is often (but not always) faster than the corresponding floating -point instructions.
 
A position for the 'binary point' is chosen for each variable to be represented, and binary shifts associated with arithmetic operations are adjusted accordingly. The binary scaling corresponds in [[Q (number format)]] to the first digit, i.e. Q1.15 is a 16 bit integer scaled with one bit as integer and fifteen as fractional. A Bscal 1 or Q1.15 number would represent approximately 1.999 to −2.0.
 
To give an example, a common way to use [[arbitrary-precision arithmetic|integer arithmetic]] to simulate floating point, using 32 -bit numbers, is to multiply the coefficients by 65536.
 
Using [[binary scientific notation]], this will place the binary point at B16. That is to say, the most significant 16 bits represent the integer part the remainder are represent the fractional part. This means, as a signed two's complement integer B16 number can hold a highest value of <math> \approx 32767.9999847 </math> and a lowest value of −32768.0. Put another way, the B number, is the number of integer bits used to represent the number which defines its value range. Remaining low bits (i.e. the non-integer bits) are used to store fractional quantities and supply more accuracy.