Fixed-point arithmetic: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:15, 18 June 2025 edit Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,988 edits No edit summary Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Latest revision as of 03:16, 13 August 2025 edit undo Folkezoft (talk \| contribs) Extended confirmed users 49,127 edits m →Notations: Tag Bare URLs using AWB Tag: AWB
(21 intermediate revisions by 2 users not shown)
Line 8: In [[computing]], '''fixed-point''' is a method of representing [[fraction (mathematics)\|fractional]] (non-integer) numbers by storing a fixed number of digits of their fractional part. [[US dollar\|Dollar]] amounts, for example, are often stored with exactly two fractional digits, representing the [[cent (currency)\|cents]] (1/100 of dollar). More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding [[floating-point representation]]. In the fixed-point representation, the fraction is often expressed in the same [[number base]] as the integer part, but using negative [[exponentiation\|powers]] of the base ''b''. The most common variants are [[decimal]] (base 10) and [[binary number\|binary]] (base 2). The latter is commonly known also as '''binary scaling'''. Thus, if ''n'' fraction digits are stored, the value will always be an integer [[multiple (mathematics)\|multiple]] of ''b''<sup>−''n''</sup>. Fixed-point representation can also be used to omit the low-order digits of integer values, e.g. when representing large dollar values as multiples of $1000. When decimal fixed-point numbers are displayed for human reading, the fraction digits are usually separated from those of the integer part by a [[radix character]] (usually '".'" in English, but '",'" or some other symbol in many other languages). Internally, however, there is no separation, and the distinction between the two groups of digits is defined only by the programs that handle such numbers. Fixed-point representation was the norm in [[mechanical calculator]]s. Since most modern [[Processor (computing)\|processors]] have a fast [[floating-point unit]] (FPU), fixed-point representations in processor-based implementations are now used only in special situations, such as in low-cost [[embedded system\|embedded]] [[microprocessor]]s and [[microcontroller]]s; in applications that demand high speed or low [[electric power\|power]] consumption or small [[integrated circuit\|chip]] area, like [[image processing\|image]], [[video processing\|video]], and [[digital signal processing]]; or when their use is more natural for the problem. Examples of the latter are [[accounting]] of dollar amounts, when fractions of cents must be rounded to whole cents in strictly prescribed ways; and the evaluation of [[function (mathematics)\|functions]] by [[table lookup]], or any application where rational numbers need to be represented without rounding errors (which fixed-point does but floating-point cannot). Fixed-point representation is still the norm for [[field-programmable gate array]] (FPGA) implementations, as floating-point support in an FPGA requires significantly more resources than fixed-point support.<ref name="Wong_2017"/> ==Representation== Line 108: {{Unreferenced section\|date=May 2023}} ===Addition and subtraction=== To add or subtract two values with the same implicit scaling factor, it is sufficient to add or subtract the underlying integers; the result will have their common implicit scaling factor and can thus be stored in the same program variables as the operands. These operations yield the exact mathematical result, as long as no [[arithmetic overflow\|overflow]] occurs—that is, as long as the resulting integer can be stored in the receiving program [[variable (computing)\|variable]]. If overflow happens, it occurs like with ordinary integers of the same signedness. In the unsigned and signed-via-two's-complement cases, the overflow behaviour is well-known as a [[finite group]]. If the operands have different scaling factors, then they must be converted to a common scaling factor before the operation. ===Multiplication=== To multiply two fixed-point numbers, it suffices to multiply the two underlying integers, and assume that the scaling factor of the result is the product of their scaling factors. : (p/q) * (r/s) = pr/qs The result will be exact, with no rounding, provided that it does not overflow the receiving variable. (Specifically, with integer multiplication, the product is up to twice the width of the two factors.) For example, multiplying the numbers 123 scaled by 1/1000 (0.123) and 25 scaled by 1/10 (2.5) yields the integer 123×25 = 3075 scaled by (1/1000)×(1/10) = 1/10000, that is 3075/10000 = 0.3075. As another example, multiplying the first number by 155 implicitly scaled by 1/32 (155/32 = 4.84375) yields the integer 123×155 = 19065 with implicit scaling factor (1/1000)×(1/32) = 1/32000, that is 19065/32000 = 0.59578125. Line 126 ⟶ 132: ===Division=== ToThe ~~divide~~division ~~two~~of fixed- point numbers, ~~one~~can ~~takes~~be ~~the~~understood ~~integer~~as ~~quotient~~the division of ~~their~~two ~~underlying~~fractions ~~integers~~of ~~and~~potentially ~~assumes~~different ~~that the~~denominators (scaling ~~factor~~factors). isWith ~~the~~{{frac\|p\|q}} ~~quotient~~and of{{frac\|r\|s}} ~~their~~(where ~~scaling~~p ~~factors.~~q r Ins ~~general~~are all integers), the ~~first~~naive ~~division~~approach ~~requires~~is ~~rounding~~to ~~and therefore~~rearrange the ~~result~~fraction isto ~~not~~form ~~exact.~~a new scaling factor (s/q): : (p/q) / (r/s) = (p÷r) / (s÷q) For example, division of 3456 scaled by 1/100 (34.56) and 1234 scaled by 1/1000 (1.234) yields the integer 3456÷1234 = 3 (rounded) with scale factor (1/100)/(1/1000) = 10, that is, 30. As another example, the division of the first number by 155 implicitly scaled by 1/32 (155/32 = 4.84375) yields the integer 3456÷155 = 22 (rounded) with implicit scaling factor (1/100)/(1/32) = 32/100 = 8/25, that is 22×32/100 = 7.04. IfWith ~~the~~very ~~result~~similar iss ~~not~~and ~~exact~~q, the ~~error~~above ~~introduced~~algorithm byresults ~~the~~in ~~rounding~~an ~~can~~overly becoarse ~~reduced~~scaling orfactor. ~~even~~This ~~eliminated~~can be improved by first converting the dividend to a smaller scaling factor. Say we reduce the scaling factor by ''n'' times, then we instead calculate: : (p/q) / (r/s) = (np/nq) / (r/s) = (np÷r) / (s÷nq) For example, if ''ra'' = 1.23 is represented as 123 with scaling 1/100, and ''sb'' = 6.25 is represented as 6250 with scaling 1/1000, then simple division of the integers yields 123÷6250 = 0 (rounded) with scaling factor (1/100)/(1/1000) = 10. If ''ra'' is first converted to 1,230,000 with scaling factor 1/1000000, the result will be 1,230,000÷6250 = 197 (rounded) with scale factor 1/1000 (0.197). The exact value 1.23/6.25 is 0.1968. A different way to think about the scaling is to consider division the inverse operation of multiplication. If multiplication leads to a finer scaling factor, it is reasonable that the dividend needs to have a finer scaling factor as well to recover the original value given. ===Scaling conversion=== Line 185 ⟶ 199: On the other hand, all relational [[database]]s and the [[SQL]] notation support fixed-point decimal arithmetic and storage of numbers. [[PostgreSQL]] has a special <samp>numeric</samp> type for exact storage of numbers with up to 1000 digits.<ref name="PostgreSQL"/> Moreover, in 2008 the [[International Organization for Standardization]] (ISO) published a draft technical report to extend the C programming language with fixed-point data types, for the benefit of programs running on embedded DSP processors. Two main kinds of data types are proposed, _Fract (fractional part with a minimum 7-bit precision) and _Accum (_Fract with at least 4 bits of integer part).<ref name="JTC1_2008"/> The [[GNU Compiler Collection]] (GCC) supports this draft.<ref name="gccback"/><ref name="gccuse"/> ==Detailed examples== Line 237 ⟶ 251: Various notations have been used to concisely specify the parameters of a fixed-point format. In the following list, ''f'' represents the number of fractional bits, ''m'' the number of magnitude or integer bits, ''s'' the number of sign bits (0/1 or some other alternative representation), and ''b'' the total number of bits. ** The [[Q (number format)\|Q notation]] was defined by [[Texas Instruments]].<ref name="TI_2003"/> One writes <code>Q''f''</code> to specify a signed binary fixed-point value with ''f'' fraction bits; for example, {{code\|Q15}} specifies a signed integer in two's complement notation with a scaling factor 1/2<sup>15</sup>. The code <code>Q''m''.''f''</code> specifies additionally that the number has ''m'' bits in the integer part of the value, not counting the sign bit. Thus {{code\|Q1.30}} would describe a binary fixed-point format with 1 integer bit and 30 fractional bits, which could be stored as a 32-bit 2's complement integer with scaling factor 1/2<sup>30</sup>.<ref name="TI_2003"/><ref name="mwork"/> A similar notation has been used by [[ARM architecture\|ARM]], except that they count the sign bit in the value of ''m''; so the same format above would be specified as {{code\|Q2.30}}.<ref name="ARM_2001"/><ref name="ARM_2006"/>▼ * Notations based on ''m.f'' A similar notation has been used by [[ARM architecture\|ARM]], except that they count the sign bit in the value of ''m''; so the same format above would be specified as {{code\|Q2.30}}.<ref name="ARM_2001"/><ref name="ARM_2006"/> The Embedded C proposal uses ''.f'' for unsigned fraction. s''.f'' for signed fraction, ''m.f'' for unsigned accumulator, and s''m.f'' for signed accumulator. This would translate the above to {{code\|s1.30}}, though this is not a valid type for either fraction or accumulator: in valid versions, ''m'' is at least 4 and depending on the underlying type ''f'' is at least 7, 15, or 23. Note the non-italicized s: it is simply prepended as a letter. ▲** The [[Q (number format)\|Q notation]] was defined by [[Texas Instruments]].<ref name="TI_2003"/> One writes <code>Q''f''</code> to specify a signed binary fixed-point value with ''f'' fraction bits; for example, {{code\|Q15}} specifies a signed integer in two's complement notation with a scaling factor 1/2<sup>15</sup>. The code <code>Q''m''.''f''</code> specifies additionally that the number has ''m'' bits in the integer part of the value, not counting the sign bit. Thus {{code\|Q1.30}} would describe a binary fixed-point format with 1 integer bit and 30 fractional bits, which could be stored as a 32-bit 2's complement integer with scaling factor 1/2<sup>30</sup>.<ref name="TI_2003"/><ref name="mwork"/> A similar notation has been used by [[ARM architecture\|ARM]], except that they count the sign bit in the value of ''m''; so the same format above would be specified as {{code\|Q2.30}}.<ref name="ARM_2001"/><ref name="ARM_2006"/> * The [[COBOL]] programming language originally supported decimal fixed-precision with arbitrary size and decimal scaling, whose format was specified "graphically" with the {{mono\|PIC}} directive. For example, {{code\|PIC S9999V99}} specified a sign-magnitude 6-digit decimal integer with two decimal fraction digits.<ref name="cobibm"/> * The construct <code>REAL FIXED BINARY (''p'',''f'')</code> is used in the [[PL/I]] programming language, to specify a fixed-point signed binary data type with ''p'' total bits (not including sign) with ''f'' bits in the fraction part; that is a ''p''+1 bit signed integer with a scaling factor of 1/2<sup>''f''</sup>. The latter could be positive or negative. One could specify {{mono\|COMPLEX}} instead of {{mono\|REAL}}, and {{mono\|DECIMAL}} instead of {{mono\|BINARY}} for base 10. * In the [[Ada programming language]], a numeric data type can be specified by, for example,{{code\|2=ada\|type F is delta 0.01005 range -~~100~~50.0 .. ~~100~~50.0}}. The decimal bounds are translated to the next power of two, ~~meaning~~hence it means a fixed-point representation consisting of a signed binary integer in two's complement format with 7at ~~implied~~least 8 fraction bits (providing a scaling factor 1/~~128~~256) and at7 ~~least 15~~sign-and-magnitude bits ~~total~~ (ensuring an actual range from ~~−128~~−64.00 to almost +~~128~~64.00): a minimum total of 15 bits. On a 16-bit computer, the spare bit is assigned to the fractional part. Asymmetrical range constraints are also allowed,<ref name="adafix"/> though the underlying implementation remains symmetric about 0.<ref>http://www.ada-auth.org/standards/2yaarm/html/AA-3-5-9.html {{Bare URL inline\|date=August 2025}}</ref> Newer versions of Ada allow specifying an exact (including non-power-of-two) scaling factor using <code> 'Small => 0.005</code> (aspect specification), or, if the factor is a power of 10, through a decimal fixed point. * The notation <code>B''m''</code> has been used<!--BY WHO?--> to mean a fixed binary format with ''m'' bits in the integer part; the rest of the word (typically 32 bits) being fraction bits. For example, the maximum and minimum values that can be stored in a signed {{code\|B16}} number are ≈32767.9999847 and −32768.0, respectively. * The [[VisSim]] company used <code>fx''m''.''b''</code> to denote a binary fixed-point value with ''b'' total bits and ''m'' bits in the integer part; that is, a ''b''-bit integer with scaling factor 1/2<sup>''b''−''m''</sup>. Thus {{code\|fx1.16}} would mean a 16-bit number with 1 bit in the integer part and 15 in the fraction.<ref name="vsi"/>