Octuple-precision floating-point format

This is an old revision of this page, as edited by Vincent Lefèvre (talk | contribs) at 15:51, 16 May 2015 (IEEE 754 octuple-precision binary floating-point format: binary256: various corrections (there was confusion with quadruple precision!) - Please check!). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computing, octuple precision is a binary floating-point-based computer number format that occupies 32 bytes (256 bits or 64 nibbles) in computer memory. This 256-bit octuple precision is for applications requiring results in higher than quadruple precision. This format is rarely (if ever) used and very few things support it.

IEEE 754 octuple-precision binary floating-point format: binary256

The IEEE 754 standard specifies a binary256 format among the interchange formats (it is not a basic format), as having:

The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the significand appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: log10(2237) ≈ 71.344).

Exponent encoding

The octuple-precision binary floating-point exponent is encoded using an offset binary representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard.

Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 16383 has to be subtracted from the stored exponent.

The stored exponents 0000016 and 7FFFF16 are interpreted specially.

Exponent Significand zero Significand non-zero Equation
0000016 0, −0 subnormal numbers  
0000116, ..., 7FFFE16 normalized value  
7FFFF16 ± NaN (quiet, signalling)

The minimum strictly positive (subnormal) value is 2−262378 ≈ 10−78984 and has a precision of only one bit. The minimum positive normal value is 2−262142 ≈ 2.4824 × 10−78913. The maximum representable value is 2262144 − 2261907 ≈ 1.6113 × 1078913.

Octuple-precision examples

These examples are given in bit representation, in hexadecimal, of the floating-point value. This includes the sign, (biased) exponent, and significand.

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  = +0
8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  = −0
7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000   = +infinity
ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000   = −infinity

By default, 1/3 rounds down like double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a unit in the last place.

Implementations

Octuple precision is rarely if ever implemented in to software since usage of it is extremely rare. One can use general arbitrary-precision arithmetic libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance.

Computer-language support

In C++, It is possible to make a library to handle octuple-precision floating-point arithmetic. In theory it is possible to do octuple-precision floating-point arithmetic in binary.

Hardware support

There is little to no hardware support for octuple-precision arithmetic.

See also

References