Single-precision floating-point format: Difference between revisions

Content deleted Content added
PL/I
 
(2 intermediate revisions by 2 users not shown)
Line 1:
{{short description|32-bit computer number format}}
{{Cleanup|reason=<br/>{{*}} This article doesn't provide a good structure to lead users from easy to deeper understanding<br/>{{*}} Some points are 'explained' by lengthy examples instead of concise description of the concept<br />|date=January 2025}}
{{Cleanup|reason=
 
* This article doesn't provide a good structure to lead users from easy to deeper understanding
* Some points are 'explained' by lengthy examples instead of concise description of the concept<br />|date=January 2025}}
 
'''Single-precision floating-point format''' (sometimes called '''FP32''' or '''float32''') is a [[computer number format]], usually occupying [[32 bits]] in [[computer memory]]; it represents a wide [[dynamic range]] of numeric values by using a [[floating point|floating radix point]].
Line 230 ⟶ 227:
These examples are given in bit ''representation'', in [[hexadecimal]] and [[Binary number|binary]], of the floating-point value. This includes the sign, (biased) exponent, and significand.
 
{| style="font-family: monospace, monospace;"
0 00000000 00000000000000000000001<sub>2</sub> = 0000 0001<sub>16</sub> = 2<sup>−126</sup> × 2<sup>−23</sup> = 2<sup>−149</sup> ≈ 1.4012984643 × 10<sup>−45</sup>
|-
(smallest positive subnormal number)
|
0 00000000 00000000000000000000001<sub>2</sub> = 0000 0001<sub>16</sub> = 2<sup>−126</sup> × 2<sup>−23</sup> = 2<sup>−149</sup> ≈ 1.4012984643 × 10<sup>−45</sup><br />
{{spaces|38}}(smallest positive subnormal number)
 
0 00000000 11111111111111111111111<sub>2</sub> = 007f ffff<sub>16</sub> = 2<sup>−126</sup> × (1 − 2<sup>−23</sup>) ≈ 1.1754942107 ×10<sup>−38</sup><br />
{{spaces|38}}(largest subnormal number)
 
0 00000001 00000000000000000000000<sub>2</sub> = 0080 0000<sub>16</sub> = 2<sup>−126</sup> ≈ 1.1754943508 × 10<sup>−38</sup><br />
{{spaces|38}}(smallest positive normal number)
 
0 11111110 11111111111111111111111<sub>2</sub> = 7f7f ffff<sub>16</sub> = 2<sup>127</sup> × (2 − 2<sup>−23</sup>) ≈ 3.4028234664 × 10<sup>38</sup><br />
{{spaces|38}}(largest normal number)
 
0 01111110 11111111111111111111111<sub>2</sub> = 3f7f ffff<sub>16</sub> = 1 − 2<sup>−24</sup> ≈ 0.999999940395355225<br />
{{spaces|38}}(largest number less than one)
 
0 01111111 00000000000000000000000<sub>2</sub> = 3f80 0000<sub>16</sub> = 1 (one)
 
0 01111111 00000000000000000000001<sub>2</sub> = 3f80 0001<sub>16</sub> = 1 + 2<sup>−23</sup> ≈ 1.00000011920928955<br />
{{spaces|38}}(smallest number larger than one)
 
1 10000000 00000000000000000000000<sub>2</sub> = c000 0000<sub>16</sub> = −2<br />
0 00000000 00000000000000000000000<sub>2</sub> = 0000 0000<sub>16</sub> = 0<br />
1 00000000 00000000000000000000000<sub>2</sub> = 8000 0000<sub>16</sub> = −0
 
0 11111111 00000000000000000000000<sub>2</sub> = 7f80 0000<sub>16</sub> = infinity<br />
1 11111111 00000000000000000000000<sub>2</sub> = ff80 0000<sub>16</sub> = −infinity
 
0 10000000 10010010000111111011011<sub>2</sub> = 4049 0fdb<sub>16</sub> ≈ 3.14159274101257324 ≈ π ( pi )<br />
0 01111101 01010101010101010101011<sub>2</sub> = 3eaa aaab<sub>16</sub> ≈ 0.333333343267440796 ≈ 1/3
x 11111111 10000000000000000000001<sub>2</sub> = ffc0 0001<sub>16</sub> = qNaN (on x86 and ARM processors)<br />
x 11111111 00000000000000000000001<sub>2</sub> = ff80 0001<sub>16</sub> = sNaN (on x86 and ARM processors)
|}
 
By default, 1/3 rounds up, instead of down like [[Double-precision floating-point format|double-precision]], because of the even number of bits in the significand. The bits of 1/3 beyond the rounding point are <code>1010...</code> which is more than 1/2 of a [[unit in the last place]].