Quadruple-precision floating-point format: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: authors 1-2. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Binary arithmetic | #UCB_Category 69/100
added sign 1 is negative, degrading precision for denormals, 'integral decoding'
Line 15:
* [[Significand]] [[precision (arithmetic)|precision]]: 113 bits (112 explicitly stored)
<!-- "significand", with a d at the end, is a technical term, please do not confuse with "significant" -->
 
The sign bit determines the sign of the number (including when this number is zero, which is [[Signed zero|signed]]). "1" stands for negative.
 
This gives from 33 to 36 significant decimal digits precision. If a decimal string with at most 33 significant digits is converted to the IEEE 754 quadruple-precision format, giving a normal number, and then converted back to a decimal string with the same number of digits, the final result should match the original string. If an IEEE 754 quadruple-precision number is converted to a decimal string with at least 36 significant digits, and then converted back to quadruple-precision representation, the final result must match the original number.<ref name="whyieee">{{cite web |author=Kahan |first=Wiliam |date=1 October 1987 |title=Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic |url=http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF}}</ref>
 
The format is written with an implicit lead bit with value 1 unless the exponent is stored with all zeros. Thus only 112 bits of the [[significand]] appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits: {{nowrap|log<sub>10</sub>(2<sup>113</sup>) ≈ 34.016}}) for normal values, denormals have gracefully degrading precision down to 1 bit for the smallest non-zero value. The bits are laid out as:
 
[[File:IEEE 754 Quadruple Floating Point Format.svg|800px|A sign bit, a 15-bit exponent, and a 112-bit significand]]
Line 165 ⟶ 167:
 
Quadruple-precision (128-bit) hardware implementation should not be confused with "128-bit FPUs" that implement [[Single instruction, multiple data|SIMD]] instructions, such as [[Streaming SIMD Extensions]] or [[AltiVec]], which refers to 128-bit [[Vector processor|vectors]] of four 32-bit single-precision or two 64-bit double-precision values that are operated on simultaneously.
 
== Add. info and curiosities ==
The IEEE 754 standard allows two different views / decodings for the numbers, one described above with a fractional understanding of the significand and a bias of 16383 for the exponent, the other understanding the significand as binary integer, 2^112 times larger, and in turn the bias for the significand 112 larger, 16495, which produces smaller effective exponents and by that the same final result. The fractional view is common for binaryxxx datatypes, while the integral is for decimalxxx datatypes. Section 3.3 "Sets of floating-point data" in 2019 ver. of the standard.
 
== See also ==