Revision as of 15:40, 8 April 2025 edit MrSwedishMeatballs (talk \| contribs) Extended confirmed users 4,250 edits →Other notable floating-point formats Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 15:41, 8 April 2025 edit undo MrSwedishMeatballs (talk \| contribs) Extended confirmed users 4,250 edits →Other notable floating-point formats Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 263: == Other notable floating-point formats == In addition to the widely used [[IEEE 754]] standard formats, other floating-point formats are used, or have been used, in certain ___domain-specific areas. * The [[Microsoft Binary Format\|Microsoft Binary Format (MBF)]] was developed for the Microsoft BASIC language products, including Microsoft's first ever product the [[Altair BASIC]] (1975), [[TRS-80\|TRS-80 LEVEL II]], [[CP/M]]'s [[MBASIC]], [[IBM PC 5150]]'s [[BASICA]], [[MS-DOS]]'s [[GW-BASIC]] and [[QuickBASIC]] prior to version 4.00. QuickBASIC version 4.00 and 4.50 switched to the IEEE 754-1985 format but can revert to the MBF format using the /MBF command option. MBF was designed and developed on a simulated [[Intel 8080]] by [[Monte Davidoff]], a dormmate of [[Bill Gates]], during spring of 1975 for the [[MITS Altair 8800]]. The initial release of July 1975 supported a single-precision (32 bits) format due to cost of the [[MITS Altair 8800]] 4-kilobytes memory. In December 1975, the 8-kilobytes version added a double-precision (64 bits) format. A single-precision (40 bits) variant format was adopted for other CPU's, notably the [[MOS 6502]] ([[Apple II]], [[Commodore PET]], [[Atari]]), [[Motorola 6800]] (MITS Altair 680) and [[Motorola 6809]] ([[TRS-80 Color Computer]]). All Microsoft language products from 1975 through 1987 used the [[Microsoft Binary Format]] until Microsoft adopted the IEEE- 754 standard format in all its products starting in 1988 to their current releases. MBF consists of the MBF single-precision format (32 bits, "6-digit BASIC"),<ref name="Borland_1994_MBF"/><ref name="Steil_2008_6502"/> the MBF extended-precision format (40 bits, "9<!-- is it really 9 digits, not 8? -->-digit BASIC"),<ref name="Steil_2008_6502"/> and the MBF double-precision format (64 bits);<ref name="Borland_1994_MBF"/><ref name="Microsoft_2006_KB35826"/> each of them is represented with an 8-bit exponent, followed by a sign bit, followed by a significand of respectively 23, 31, and 55 bits. * The [[bfloat16 floating-point format\|bfloat16 format]] requires the same amount of memory (16 bits) as the [[Half-precision floating-point format\|IEEE 754 half-precision format]], but allocates 8 bits to the exponent instead of 5, thus providing the same range as a [[Single-precision floating-point format\|IEEE 754 single-precision]] number. The tradeoff is a reduced precision, as the trailing significand field is reduced from 10 to 7 bits. This format is mainly used in the training of [[machine learning]] models, where range is more valuable than precision. Many machine learning accelerators provide hardware support for this format. * The TensorFloat-32<ref name="Kharya_2020"/> format combines the 8 bits of exponent of the bfloat16 with the 10 bits of trailing significand field of half-precision formats, resulting in a size of 19 bits. This format was introduced by [[Nvidia]], which provides hardware support for it in the Tensor Cores of its [[Graphics processing unit\|GPUs]] based on the Nvidia Ampere architecture. The drawback of this format is its size, which is not a power of 2. However, according to Nvidia, this format should only be used internally by hardware to speed up computations, while inputs and outputs should be stored in the 32-bit single-precision IEEE 754 format.<ref name="Kharya_2020"/>

Floating-point arithmetic: Difference between revisions