Division algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:36, 29 June 2024 edit Trivialist (talk \| contribs) Extended confirmed users, Pending changes reviewers 210,613 edits m rm TM symbol ← Previous edit		Latest revision as of 15:46, 25 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,026 edits Added bibcode. Removed URL that duplicated identifier. Removed access-date with no URL. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 709/1032
(34 intermediate revisions by 18 users not shown)
Line 16: is the output. == Division by repeated subtraction == The simplest division algorithm, historically incorporated into a [[greatest common divisor]] algorithm presented in [[Euclid's Elements\|Euclid's ''Elements'']], Book VII, Proposition 1, finds the remainder given two positive integers using only subtractions and comparisons: Line 55: This procedure always produces R ≥ 0. Although very simple, it takes Ω(Q) steps, and so is exponentially slower than even slow division algorithms like long division. It is useful if Q is known to be small (being an [[output-sensitive algorithm]]), and can serve as an executable specification. == Long division == {{Main\|Long division#Algorithm for arbitrary base}} Long division is the standard algorithm used for pen-and-paper division of multi-digit numbers expressed in decimal notation. It shifts gradually from the left to the right end of the dividend, subtracting the largest possible multiple of the divisor (at the digit level) at each stage; the multiples then become the digits of the quotient, and the final difference is then the remainder. Line 61: When used with a binary radix, this method forms the basis for the (unsigned) integer division with remainder algorithm below. [[Short division]] is an abbreviated form of long division suitable for one-digit divisors. [[Chunking (division)\|Chunking]]{{snd}} also known as the partial quotients method or the hangman method{{snd}} is a less-efficient form of long division which may be easier to understand. By allowing one to subtract more multiples than what one currently has at each stage, a more freeform variant of long division can be developed as well. === Integer division (unsigned) with remainder === {{See also\|Binary number#Division}} The following algorithm, the binary version of the famous [[long division]], will divide ''N'' by ''D'', placing the quotient in ''Q'' and the remainder in ''R''. In the following pseudo-code, all values are treated as unsigned integers. Line 79: </syntaxhighlight> ==== Example ==== If we take N=1100<sub>2</sub> (12<sub>10</sub>) and D=100<sub>2</sub> (4<sub>10</sub>) Line 110: Q=11<sub>2</sub> (3<sub>10</sub>) and R=0. == Slow division methods == Slow division methods are all based on a standard recurrence equation <ref>{{Cite book\|last1=Morris\|first1=James E.\| url=https://books.google.com/books?id=wAhEDwAAQBAJ&q=restoring+division+fixed-point+fractional+numbers&pg=PA243\| title=Nanoelectronic Device Applications Handbook\|last2=Iniewski\|first2=Krzysztof\|date=2017-11-22\|publisher=CRC Press\| isbn=978-1-351-83197-0\|language=en}}</ref> :<math>R_{j+1} = B \times R_j - q_{n-(j+1)}\times D ,</math> where: Line 120: * ''D'' is the divisor === Restoring division === Restoring division operates on [[fixed point arithmetic\|fixed-point]] fractional numbers and depends on the assumption 0 < ''D'' < ''N''. {{citation needed\|date=February 2012}} <!-- see "Integer division (unsigned) with remainder" on talk --> The quotient digits ''q'' are formed from the digit set {0,1}. Line 145: Non-performing restoring division is similar to restoring division except that the value of 2R is saved, so ''D'' does not need to be added back in for the case of R < 0. === Non-restoring division === Non-restoring division uses the digit set {−1, 1} for the quotient digits instead of {0, 1}. The algorithm is more complex, but has the advantage when implemented in hardware that there is only one decision and addition/subtraction per quotient bit; there is no restoring step after the subtraction,<ref>{{Cite journal \|last=Shaw \|first=Robert F. \|date=1950 \|title=Arithmetic Operations in a Binary Computer \|url=http://aip.scitation.org/doi/10.1063/1.1745692 \|journal=Review of Scientific Instruments \|language=en \|volume=21 \|issue=8 \|pages=690 \|doi=10.1063/1.1745692 \|bibcode=1950RScI...21..687S \|issn=0034-6748 \|access-date=2022-02-28 \|archive-date=2022-02-28 \|archive-url=https://web.archive.org/web/20220228182241/https://aip.scitation.org/doi/10.1063/1.1745692 \|url-status=live \|url-access=subscription }}</ref> which potentially cuts down the numbers of operations by up to half and lets it be executed faster.<ref>{{Cite web\|url=https://web.stanford.edu/class/ee486/doc/chap5.pdf\|title=Stanford EE486 (Advanced Computer Arithmetic Division){{snd}} Chapter 5 Handout (Division)\|last=Flynn\|website=Stanford University\|access-date=2019-06-24\|archive-date=2022-04-18\|archive-url=https://web.archive.org/web/20220418044630/http://web.stanford.edu/class/ee486/doc/chap5.pdf\|url-status=live}}</ref> The basic algorithm for binary (radix 2) non-restoring division of non-negative numbers is:{{check\|reason=See talk page\|date=January 2025}} <syntaxhighlight lang="lua"> Line 166: Following this algorithm, the quotient is in a non-standard form consisting of digits of −1 and +1. This form needs to be converted to binary to form the final quotient. Example: {\| border="0" cellpadding="0" ~~\|colspan=2\|Convert the following quotient to the digit set {0,1}:~~ \|- \| colspan="2" \| Convert the following quotient to the digit set {0,1}: ~~\|Start: \|\| <math>Q = 111\bar{1}1\bar{1}1\bar{1}</math>~~ \|- \|~~1. Form the positive~~ ~~term~~Start: \|\| <math>PQ = ~~11101010~~111\,bar{1}1\bar{1}1\bar{1}</math> \|- \|2 1. ~~Mask~~Form the ~~negative~~positive term: \|\| <math>MP = ~~00010101~~11101010\,</math> \|- \| 2. Mask the negative term:{{NoteTag\|Signed binary notation with [[ones' complement]] without [[two's complement]].}} \|\| <math>M = 00010101\,</math> ~~\|3. Subtract: <math>P - M</math>: \|\|<math>Q = 11010101\,</math>~~ \|- \| 3. Subtract: <math>P - M</math>: \|\| <math>Q = 11010101\,</math> ~~\|colspan=2\|.( Signed binary notation with [[ones' complement]] without [[two's complement]])~~ \|- \| colspan="2" \| {{reflist\|group=note}} \|} Line 192 ⟶ 193: </syntaxhighlight> The actual remainder is R >> n. (As with restoring division, the low-order bits of R are used up at the same rate as bits of the quotient Q are produced, and it is common to use a single shift register for both.) ==={{anchor\|SRT}}SRT division=== SRT division is a popular method for division in many [[microprocessor]] implementations.<ref>{{cite tech report \|url=http://pages.hmc.edu/harris/research/srtlong.pdf \|title=SRT Division: Architectures, Models, and Implementations \|first1=David L. \|last1=Harris \|first2=Stuart F. \|last2=Oberman \|first3=Mark A. \|last3=Horowitz \|publisher=Stanford University \|date=9 September 1998 \|access-date=23 December 2016 \|archive-date=24 December 2016 \|archive-url=https://web.archive.org/web/20161224030439/http://pages.hmc.edu/harris/research/srtlong.pdf \|url-status=live }}</ref><ref>{{cite journal \|url=https://ieeexplore.ieee.org/document/614875 \|title=SRT Division Algorithms as Dynamical Systems \|first1=Mark \|last1=McCann \|first2=Nicholas \|last2=Pippenger \|journal=SIAM Journal on Computing \|volume=34 \|issue=6 \|pages=1279–1301 \|year=2005 \|doi=10.1137/S009753970444106X \|hdl=2429/12179 \|citeseerx=10.1.1.72.6993 \|access-date=2022-08-24 \|archive-date=2022-08-24 \|archive-url=https://web.archive.org/web/20220824213238/https://ieeexplore.ieee.org/document/614875 \|url-status=live }}</ref> The algorithm is named after D. W. Sweeney of [[IBM]], James E. Robertson of [[University of Illinois]], and [[K. D. Tocher]] of [[Imperial College London]]. They all developed the algorithm independently at approximately the same time (published in February 1957, September 1958, and January 1958 respectively).<ref>{{Citation \|title=High speed arithmetic in a parallel device \|last1=Cocke \|first1=John \|pages=20 \|url=https://www.computerhistory.org/collections/catalog/102632302 \|publication-date=11 February 1957 \|last2=Sweeney \|first2=D.W. \|date=11 February 1957 \|type=Company Memo \|publication-place=IBM \|access-date=24 August 2022 \|archive-date=24 August 2022 \|archive-url=https://web.archive.org/web/20220824212341/https://www.computerhistory.org/collections/catalog/102632302 \|url-status=live }}</ref><ref>{{Cite journal \|title=A New Class of Digital Division Methods \|journal=IRE Transactions on Electronic Computers ~~\|url=https://ieeexplore.ieee.org/document/5222579~~ \|last=Robertson \|first=James \|date=1958-09-01 \|volume=EC-7 \|issue=3 \|pages=218–222 \|publisher=IEEE \|doi=10.1109/TEC.1958.5222579 \|hdl=2027/uiuo.ark:/13960/t0gt7529c \|hdl-access=free ~~\|access-date=2022-08-24 \|archive-date=2022-08-24 \|archive-url=https://web.archive.org/web/20220824213239/https://ieeexplore.ieee.org/document/5222579 \|url-status=live~~ }}</ref><ref>{{Cite journal \|journal=The Quarterly Journal of Mechanics and Applied Mathematics \|url=https://academic.oup.com/qjmam/article-abstract/11/3/364/1883426 \|last=Tocher \|first=K.D. \|title=Techniques of Multiplication and Division for Automatic Binary Computers \|date=1958-01-01 \|issue=3 \|volume=11 \|pages=364–384 \|doi=10.1093/qjmam/11.3.364 \|access-date=2022-08-24 \|archive-date=2022-08-24 \|archive-url=https://web.archive.org/web/20220824214400/https://academic.oup.com/qjmam/article-abstract/11/3/364/1883426 \|url-status=live \|url-access=subscription }}</ref> SRT division is similar to non-restoring division, but it uses a [[lookup table]] based on the dividend and the divisor to determine each quotient digit. Line 203 ⟶ 204: Like non-restoring division, the final steps are a final full-width subtraction to resolve the last quotient bit, and conversion of the quotient to standard binary form. The [[Original Intel Pentium (P5 microarchitecture)\|Intel Pentium]] processor's [[Pentium FDIV bug\|infamous floating-point division bug]] was caused by an incorrectly coded lookup table. Five of the 1066 entries had been mistakenly omitted.<ref>{{cite web \|url=http://www.intel.com/support/processors/pentium/sb/cs-012997.htm \|title=Statistical Analysis of Floating Point Flaw \|publisher=Intel Corporation \|year=1994 \|access-date=22 October 2013 \|archive-date=23 October 2013 \|archive-url=https://web.archive.org/web/20131023060231/http://www.intel.com/support/processors/pentium/sb/cs-012997.htm \|url-status=live }}</ref><ref>{{cite tech report \|url=http://i.stanford.edu/pub/cstr/reports/csl/tr/95/675/CSL-TR-95-675.pdf \|title=An Analysis of Division Algorithms and Implementations \|first1=Stuart F. \|last1=Oberman \|first2=Michael J. \|last2=Flynn \|id=CSL-TR-95-675 \|date=July 1995 \|publisher=Stanford University \|access-date=2016-12-23 \|archive-date=2017-05-17 \|archive-url=https://web.archive.org/web/20170517133304/http://i.stanford.edu/pub/cstr/reports/csl/tr/95/675/CSL-TR-95-675.pdf \|url-status=live }}</ref><ref>{{Cite web \|last=Shirriff \|first=Ken \|date=28 Dec 2024 \|title=Intel's $475 million error: the silicon behind the Pentium division bug \|url=https://www.righto.com/2024/12/this-die-photo-of-pentium-shows.html \|access-date=30 Dec 2024 \|website=Righto}}</ref> == Fast division methods == === Newton–Raphson division === ~~===Newton–Raphson division===~~ Newton–Raphson uses [[Newton's method]] to find the [[Multiplicative inverse\|reciprocal]] of <math>D</math> and multiply that reciprocal by <math>N</math> to find the {{nowrap\|final quotient <math>Q</math>.}} Line 231: This squaring of the error at each iteration step{{snd}} the so-called [[Newton's method#Practical considerations\|quadratic convergence]] of Newton–Raphson's method{{snd}} has the effect that the number of correct digits in the result roughly ''doubles for every iteration'', a property that becomes extremely valuable when the numbers involved have many digits (e.g. in the large integer ___domain). But it also means that the initial convergence of the method can be comparatively slow, especially if the initial estimate <math>X_0</math> is poorly chosen. ==== Initial estimate ==== For the subproblem of choosing an initial estimate <math>X_0</math>, it is convenient to apply a bit-shift to the divisor ''D'' to scale it so that 0.5 ≤ ''D'' ≤ 1; by applying the same bit-shift to the numerator ''N'', one ensures the quotient does not change. Then one could use a linear [[approximation]] in the form For the subproblem of choosing an initial estimate <math>X_0</math>, it is convenient to apply a bit-shift to the divisor ''D'' to scale it so that 0.5 ≤ ''D'' ≤ 1. Applying the same bit-shift to the numerator ''N'' ensures the quotient does not change. Once within a bounded range, a simple polynomial [[approximation]] can be used to find an initial estimate. ~~:<math>X_0 = T_1 + T_2 D \approx \frac{1}{D} \,</math>~~ toThe ~~initialize~~linear ~~Newton–Raphson.~~[[approximation]] Towith ~~minimize~~minimum ~~the maximum of the~~worst-case absolute ~~value~~error ofon the ~~error of this approximation on~~ interval <math>[0.5,1]</math>~~, one should~~ ~~use~~is: :<math>X_0 = {48 \over 17} - {32 \over 17} D. \,</math> The coefficients of the linear approximation <math>T_0 +T_1 D</math> are determined as follows. The absolute value of the error is <math>\|\varepsilon_0\| = \|1 - D(~~T_1~~T_0 + ~~T_2~~T_1 D)\|</math>. The minimum of the maximum absolute value of the error is determined by the [[Equioscillation theorem\|Chebyshev equioscillation theorem]] applied to <math>F(D) = 1 - D(~~T_1~~T_0 + ~~T_2~~T_1 D)</math>. The local minimum of <math>F(D)</math> occurs when <math>F'(D) = 0</math>, which has solution <math>D = -~~T_1~~T_0/(~~2T_2~~2T_1)</math>. The function at that minimum must be of opposite sign as the function at the endpoints, namely, <math>F(1/2) = F(1) = -F(-~~T_1~~T_0/(~~2T_2~~2T_1))</math>. The two equations in the two unknowns have a unique solution <math>~~T_1~~T_0 = 48/17</math> and <math>~~T_2~~T_1 = -32/17</math>, and the maximum error is <math>F(1) = 1/17</math>. Using this approximation, the absolute value of the error of the initial value is less than :<math>\vert \varepsilon_0 \vert \leq {1 \over 17} \approx 0.059 . \,</math> It is possible to generate a polynomial fit of degree larger than 1, computing the coefficients using the [[Remez algorithm]]. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson. The best quadratic fit to <math>1/D</math> in the interval is ~~Since for this method the [[rate of convergence\|convergence]] is exactly quadratic, it follows that~~ :<math> X := \frac{140}{33} - \frac{64}{11} D + \frac{256}{99} D^2.</math> It is chosen to make the error equal to a re-scaled third order [[Chebyshev polynomial]] of the first kind, and gives an absolute value of the error less than or equal to 1/99. This improvement is equivalent to <math>\log_2(\log 99/\log 17) \approx 0.7</math> Newton–Raphson iterations, at a computational cost of less than one iteration. It is possible to generate a polynomial fit of degree larger than 2, computing the coefficients using the [[Remez algorithm]]. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson. ~~:<math>S = \left \lceil \log_2 \frac{P + 1}{\log_2 17} \right \rceil \,</math>~~ Since for this method the [[rate of convergence\|convergence]] is exactly quadratic, it follows that, from an initial error <math>\varepsilon_0</math>, <math>S</math> iterations will give an answer accurate to steps are enough to calculate the value up to <math>P \,</math> binary places. This evaluates to 3 for IEEE [[single precision]] and 4 for both [[double precision]] and [[extended precision\|double extended]] formats. :<math>P = -2^S \log_2 \varepsilon_0 - 1 = 2^S \log_2(1/\varepsilon_0) - 1</math> binary places. Typical values are: {\| class="wikitable" style="text-align: right;" ~~====Pseudocode====~~ \|+ Binary digits of reciprocal accuracy \|- ! rowspan="2" \| <math>\varepsilon_0</math> \|\| colspan="5" \| Iterations \|- ! 0 \|\| 1 \|\| 2 \|\| 3 \|\| 4 \|- ! <math>1/17</math> \| {{round\|{{#expr: (ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 2(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 4(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 8(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr:16(ln17/ln2) - 1}}\|2}} \|- ! <math>1/99</math> \| {{round\|{{#expr: (ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 2(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 4(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 8(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr:16(ln99/ln2) - 1}}\|2}} \|} A quadratic initial estimate plus two iterations is accurate enough for IEEE [[single precision]], but three iterations are marginal for [[double precision]]. A linear initial estimate plus four iterations is sufficient for both double and [[extended precision\|double extended]] formats. ==== Pseudocode ==== The following computes the quotient of {{var\|N}} and {{var\|D}} with a precision of {{var\|P}} binary places: <div style="font-family: monospace, monospace;"> ~~Express D as M × 2<sup>e</sup> where 1 ≤ M < 2 (standard floating point representation)~~ Express D' :=as DM /× 2<sup>e+1</sup> ~~''// scale between 0.5 and~~where 1, ~~can~~≤ beM ~~performed~~< ~~with~~2 ~~bit~~(standard ~~shift~~floating /point ~~exponent subtraction''~~representation)<br> D' := D / 2<sup>e+1</sup> ''// scale between 0.5 and 1, can be performed with bit shift / exponent subtraction''<br> ~~N' := N / 2<sup>e+1</sup>~~ N' := N / 2<sup>e+1</sup><br> ~~X := 48/17 − 32/17 × D' ''// precompute constants with same precision as D''~~ X := 48/17 − 32/17 × D' ''// precompute constants with same precision as D''<br> {{nowrap\|'''repeat''' <math>\left \lceil \log_2 \frac{P + 1}{\log_2 17} \right \rceil \,</math> '''times'''}} ''// can be precomputed based on fixed P''<br> X := X + X × (1 - D' × X)<br> '''end'''<br> '''return''' N' × X<br> </div> For example, for a double-precision floating-point division, this method uses 10 multiplies, 9 adds, and 2 shifts. ====~~Variant~~ ~~Newton–Raphson~~Cubic iteration ~~division~~==== There is an iteration which uses three multiplications to cube the error: ~~The Newton-Raphson division method can be modified to be slightly faster as follows. After shifting ''N'' and ''D'' so that ''D'' is in [0.5, 1.0], initialize with~~ :<math> X\varepsilon_i := ~~\frac{140}{33}~~1 ~~+ D \cdot \left(\frac{~~-~~64}{11} +~~ D ~~\cdot \frac{256}{99}\right)~~X_i .</math> :<math> Y_i = X_i \varepsilon_i </math> This is the best quadratic fit to 1/''D'' and gives an absolute value of the error less than or equal to 1/99. It is chosen to make the error equal to a re-scaled third order [[Chebyshev polynomial]] of the first kind. The coefficients should be pre-calculated and hard-coded. :<math> X_{i+1} = X_i + Y_i + Y_i \varepsilon_i .</math> The ''Y<sub>i</sub>ε<sub>i</sub>'' term is new. Expanding out the above, <math>X_{i+1}</math> can be written as ~~Then in the loop, use an iteration which cubes the error.~~ :<math> ~~E := 1 - D~~ \~~cdot X </math>~~begin{align} X_{i+1} &= X_i + X_i\varepsilon_i + X_i\varepsilon_i^2 \\ ~~:<math> Y := X \cdot E </math>~~ &= X_i + X_i(1-DX_i) + X_i(1-DX_i)^2 \\ ~~:<math> X := X + Y + Y \cdot E .</math>~~ &= 3X_i - 3DX_i^2 + D^2X_i^3 , ~~The ''Y''⋅''E'' term is new.~~ \end{align}</math> with the result that the error term :<math>\begin{align} \varepsilon_{i+1} &= 1 - DX_{i+1} \\ &= 1 - 3DX_i + 3D^2X_i^2 - D^3X_i^3 \\ &= (1 - DX_i)^3 \\ &= \varepsilon_i^3 . \end{align}</math> This is 3/2 the computation of the quadratic iteration, but achieves <math>\log 3 / \log 2 \approx 1.585</math> as much convergence, so is slightly more efficient. Put another way, two iterations of this method raise the error to the ninth power at the same computational cost as three quadratic iterations, which only raise the error to the eighth power. ~~If the loop is performed until X agrees with 1/''D'' on its leading ''P'' bits, then the number of iterations will be no more than~~ ~~:<math> \left \lceil \log_3 \left(\frac{P + 1}{\log_2 99} \right) \right \rceil </math>~~ ~~which is the number of times 99 must be cubed to get to 2<sup>''P''+1</sup>. Then~~ ~~:<math> Q:= N \cdot X </math>~~ ~~is the quotient to ''P'' bits.~~ The number of correct bits after <math>S</math> iterations is Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations. :<math>P = -3^S \log_2 \varepsilon_0 - 1 = 3^S \log_2(1/\varepsilon_0) - 1</math> binary places. Typical values are: {\| class="wikitable" style="text-align: right;" \|+ Bits of reciprocal accuracy \|- ! rowspan="2" \| <math>\varepsilon_0</math> \|\| colspan="4" \| Iterations \|- ! 0 \|\| 1 \|\| 2 \|\| 3 \|- ! <math>1/17</math> \| {{round\|{{#expr: (ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 3(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 9(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr:27(ln17/ln2) - 1}}\|2}} \|- ! <math>1/99</math> \| {{round\|{{#expr: (ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 3(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 9(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr:27(ln99/ln2) - 1}}\|2}} \|} A quadratic initial estimate plus two cubic iterations provides ample precision for an IEEE double-precision result. It is also possible to use a mixture of quadratic and cubic iterations. Using at least one quadratic iteration ensures that the error is positive, i.e. the reciprocal is underestimated.<ref name=DigitalArithmetic>{{cite book \|title=Digital Arithmetic \|chapter=Chapter 7: Reciprocal. Division, Reciprocal Square Root, and Square Root by Iterative Approximation \|first1=Miloš D. \|last1=Ercegovac \|first2=Tomás \|last2=Lang \|pages=367–395 \|year=2004 \|publisher=Morgan Kaufmann \|isbn=1-55860-798-6 }}</ref>{{rp\|370}} This can simplify a following rounding step if an exactly-rounded quotient is required. Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations.{{cn\|date=January 2025}} === {{anchor\|AEGP}}Goldschmidt division === Goldschmidt division<ref>{{cite thesis \|first=Robert E. \|last=Goldschmidt \|title=Applications of Division by Convergence \|series=M.Sc. dissertation \|publisher=M.I.T. \|date=1964 \|oclc=34136725 \|url=http://dspace.mit.edu/bitstream/handle/1721.1/11113/34136725-MIT.pdf \|access-date=2015-09-15 \|archive-date=2015-12-10 \|archive-url=https://web.archive.org/web/20151210223340/http://dspace.mit.edu/bitstream/handle/1721.1/11113/34136725-MIT.pdf \|url-status=live }}</ref> (after Robert Elliott Goldschmidt)<ref>{{cite journal \|title=Authors \|url=https://ieeexplore.ieee.org/document/5392026 \|journal=IBM Journal of Research and Development \| year=1967 \| volume=11 \| pages=125–127 \| doi=10.1147/rd.111.0125 \|archive-url=https://web.archive.org/web/20180718114413/https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5392026 \|archive-date=18 July 2018\|url-access=subscription }}</ref> uses an iterative process of repeatedly multiplying both the dividend and divisor by a common factor ''F''<sub>''i''</sub>, chosen such that the divisor converges to 1. This causes the dividend to converge to the sought quotient ''Q'': :<math>Q = \frac{N}{D} \frac{F_1}{F_1} \frac{F_2}{F_2} \frac{F_\ldots}{F_\ldots}.</math> Line 298 ⟶ 366: After a sufficient number ''k'' of iterations <math>Q=N_k</math>. The Goldschmidt method is used in [[AMD]] Athlon CPUs and later models.<ref>{{cite book \|first=Stuart F. \|last=Oberman \|title=Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336) \|chapter=Floating point division and square root algorithms and implementation in the AMD-K7 Microprocessor \|pages=106–115 \|date=1999 \|doi=10.1109/ARITH.1999.762835 \|isbn=0-7695-0116-8 \|s2cid=12793819 \|chapter-url=http://www.acsel-lab.com/arithmetic/arith14/papers/ARITH14_Oberman.pdf \|access-date=2015-09-15 \|archive-date=2015-11-29 \|archive-url=https://web.archive.org/web/20151129095846/http://www.acsel-lab.com/arithmetic/arith14/papers/ARITH14_Oberman.pdf \|url-status=live }}</ref><ref>{{cite journal \|first1=Peter \|last1=Soderquist \|first2=Miriam \|last2=Leeser \|title=Division and Square Root: Choosing the Right Implementation \|journal=IEEE Micro \|volume=17 \|issue=4 \|pages=56–66 \|date=July–August 1997 \|url=https://www.researchgate.net/publication/2511700 \|doi=10.1109/40.612224 }}</ref> It is also known as Anderson Earle Goldschmidt Powers (AEGP) algorithm and is implemented by various [[IBM]] processors.<ref>S. F. Anderson, J. G. Earle, R. E. Goldschmidt, D. M. Powers. ''The IBM 360/370 model 91: floating-point execution unit'', [[IBM Journal of Research and Development]], January 1997</ref><ref name="goldschmidt-analysis">{{cite journal \|last1=Guy \|first1=Even \|last2=Peter \|first2=Siedel \|last3=Ferguson \|first3=Warren \|title=A parametric error analysis of Goldschmidt's division algorithm \|journal=Journal of Computer and System Sciences \|date=1 February 2005 \|volume=70 \|issue=1 \|pages=118–139 \|doi=10.1016/j.jcss.2004.08.004 \|doi-access=free }}</ref> Although it converges at the same rate as a Newton–Raphson implementation, one advantage of the Goldschmidt method is that the multiplications in the numerator and in the denominator can be done in parallel.<ref name="goldschmidt-analysis" /> ==== Binomial theorem ==== The Goldschmidt method can be used with factors that allow simplifications by the [[binomial theorem]]. Assume {{tmath\|N/D}} has been scaled by a [[power of two]] such that <math>D\in\left(\tfrac{1}{2},1\right]</math>. Line 322 ⟶ 390: which is maximum at <math>2^{-2^n}</math> when <math>x = \tfrac{1}{2}</math>, thus providing a minimum precision of <math>2^n</math> binary digits. == Large-integer methods == Methods designed for hardware implementation generally do not scale to integers with thousands or millions of decimal digits; these frequently occur, for example, in [[Modular arithmetic\|modular]] reductions in [[cryptography]]. For these large integers, more efficient division algorithms transform the problem to use a small number of multiplications, which can then be done using an asymptotically efficient [[multiplication algorithm]] such as the [[Karatsuba algorithm]], [[Toom–Cook multiplication]] or the [[Schönhage–Strassen algorithm]]. The result is that the [[computational complexity]] of the division is of the same order (up to a multiplicative constant) as that of the multiplication. Examples include reduction to multiplication by [[Newton's method]] as [[#Newton–Raphson division\|described above]],<ref>{{Cite thesis \|degree=M.Sc. in Computer Science \|title=Fast Division of Large Integers: A Comparison of Algorithms \|url=https://treskal.com/s/masters-thesis.pdf \|last=Hasselström \|first=Karl \|year=2003 \|publisher=Royal Institute of Technology \|access-date=2017-07-08 \|archive-date=8 July 2017 \|archive-url=https://web.archive.org/web/20170708221722/https://static1.squarespace.com/static/5692a9ad7086d724272eb00a/t/5692dbe6b204d50df79e577f/1452465127528/masters-thesis.pdf}}</ref> as well as the slightly faster [[Burnikel-Ziegler division]],<ref>{{citation \|url=https://domino.mpi-inf.mpg.de/internet/reports.nsf/efc044f1568a0058c125642e0064c817/a8cfefdd1ac031bbc125669b00493127/$FILE/MPI-I-98-1-022.ps \|title=Fast Recursive Division \|first=Christoph Burnikel \|last=Joachim Ziegler \|year=1998 \|___location=Max-Planck-Institut für Informatik \|access-date=2021-09-10 \|archive-date=2011-04-26 \|archive-url=https://web.archive.org/web/20110426221250/http://domino.mpi-inf.mpg.de/internet/reports.nsf/efc044f1568a0058c125642e0064c817/a8cfefdd1ac031bbc125669b00493127/$FILE/MPI-I-98-1-022.ps \|url-status=live }}</ref> [[Barrett reduction]] and [[Montgomery reduction]] algorithms.<ref>{{cite conference \|url=http://portal.acm.org/citation.cfm?id=36688 \|title=Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor \|first=Paul \|last=Barrett \|year=1987 \|publisher=Springer-Verlag \|book-title=Proceedings on Advances in cryptology---CRYPTO '86 \|pages=311–323 \|___location=London, UK \|isbn=0-387-18047-8 }}</ref>{{verification needed\|date=June 2015\|reason=Barrett reduction is usually understood to be the algorithm for computing the remainder that one gets from having precomputed the inverse of the denominator. Rather than providing a solution to the problem of division, it requires that a separate solution is already available!}} Newton's method is particularly efficient in scenarios where one must divide by the same divisor many times, since after the initial Newton inversion only one (truncated) multiplication is needed for each division. == Division by a constant == The division by a constant ''D'' is equivalent to the multiplication by its [[Multiplicative inverse\|reciprocal]]. Since the denominator is constant, so is its reciprocal (1/''D''). Thus it is possible to compute the value of (1/''D'') once at compile time, and at run time perform the multiplication ''N''·(1/''D'') rather than the division ''N/D''. In [[floating-point]] arithmetic the use of (1/''D'') presents little problem,{{efn\|Despite how "little" problem the optimization causes, this reciprocal optimization is still usually hidden behind a "fast math" flag in modern [[compiler]]s as it is inexact.}} but in [[Integer (computer science)\|integer]] arithmetic the reciprocal will always evaluate to zero (assuming \|''D''\| > 1). Line 331 ⟶ 399: It is not necessary to use specifically (1/''D''); any value (''X''/''Y'') that reduces to (1/''D'') may be used. For example, for division by 3, the factors 1/3, 2/6, 3/9, or 194/582 could be used. Consequently, if ''Y'' were a power of two the division step would reduce to a fast right bit shift. The effect of calculating ''N''/''D'' as (''N''·''X'')/''Y'' replaces a division with a multiply and a shift. Note that the parentheses are important, as ''N''·(''X''/''Y'') will evaluate to zero. However, unless ''D'' itself is a power of two, there is no ''X'' and ''Y'' that satisfies the conditions above. Fortunately, (''N''·''X'')/''Y'' gives exactly the same result as ''N''/''D'' in integer arithmetic even when (''X''/''Y'') is not exactly equal to 1/''D'', but "close enough" that the error introduced by the approximation is in the bits that are discarded by the shift operation.<ref>{{cite journal \|title=Division by Invariant Integers using Multiplication \|first1=Torbjörn \|last1=Granlund \|first2=Peter L. \|last2=Montgomery \|journal=SIGPLAN Notices \|volume=29 \|issue=6 \|date=June 1994 \|pages=61–72 \|doi=10.1145/773473.178249 \|url=http://gmplib.org/~tege/divcnst-pldi94.pdf \|citeseerx=10.1.1.1.2556 \|access-date=2015-12-08 \|archive-date=2019-06-06 \|archive-url=https://web.archive.org/web/20190606211506/https://gmplib.org/~tege/divcnst-pldi94.pdf \|url-status=live }}</ref><ref>{{cite journal \|title=Improved Division by Invariant Integers \|first1=Niels \|last1=Möller \|first2=Torbjörn \|last2=Granlund \|journal=IEEE Transactions on Computers \|date=February 2011 \|volume=60 \|issue=2 \|pages=165–175 \|doi=10.1109/TC.2010.143 \|bibcode=2011ITCmp..60..165M \|s2cid=13347152 \|url=http://gmplib.org/~tege/division-paper.pdf \|access-date=2015-12-08 \|archive-date=2015-12-22 \|archive-url=https://web.archive.org/web/20151222160554/https://gmplib.org/~tege/division-paper.pdf \|url-status=live }}</ref><ref>ridiculous_fish. [http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf "Labor of Division (Episode III): Faster Unsigned Division by Constants"] {{Webarchive\|url=https://web.archive.org/web/20220108225258/http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf \|date=2022-01-08 }}. 2011.</ref> [[Barrett reduction]] uses powers of 2 for the value of ''Y'' to make division by ''Y'' a simple right shift.{{efn\|1=Modern [[compilers]] commonly perform this integer multiply-and-shift optimization; for a constant only known at run-time, however, the program must implement the optimization itself.<ref>{{cite web \|last1=ridiculous_fish \|title=libdivide, optimized integer division \|url=https://libdivide.com/ \|access-date=6 July 2021 \|archive-date=23 November 2021 \|archive-url=https://web.archive.org/web/20211123015446/https://libdivide.com/ \|url-status=live }}</ref>}} Line 344 ⟶ 412: In some cases, division by a constant can be accomplished in even less time by converting the "multiply by a constant" into a [[Multiplication algorithm#Shift and add\|series of shifts and adds or subtracts]].<ref>LaBudde, Robert A.; Golovchenko, Nikolai; Newton, James; and Parker, David; [http://techref.massmind.org/techref/method/math/divconst.htm ''Massmind: "Binary Division by a Constant"''] {{Webarchive\|url=https://web.archive.org/web/20220109215748/http://techref.massmind.org/techref/method/math/divconst.htm \|date=2022-01-09 }}</ref> Of particular interest is division by 10, for which the exact quotient is obtained, with remainder if required.<ref>{{cite journal \|first=R. A. \|last=Vowels \|title=Division by 10 \|journal=Australian Computer Journal \|volume=24 \|issue=3 \|year=1992 \|pages=81–85 }}</ref> == Rounding error == {{expand section\|date=September 2012}} When a division operation is performed, the exact [[quotient]] <math>q</math> and [[remainder]] <math>r</math> are approximated to fit within the computer’s precision limits. The Division Algorithm states: ~~[[Round-off error]] can be introduced by division operations due to limited [[Precision (computer science)\|precision]].~~ <math>[ a = bq + r ]</math> where <math>0 \leq r < \|b\|</math>. In [[floating-point arithmetic]], the quotient <math>q</math> is represented as <math>\tilde{q}</math> and the remainder <math>r</math> as <math>\tilde{r}</math>, introducing [[Round-off error\|rounding errors]] <math>\epsilon_q</math><math>\epsilon_q</math> and <math>\epsilon_r</math>: <math>[ \tilde{q} = q + \epsilon_q ] [ \tilde{r} = r + \epsilon_r ]</math> This rounding causes a small error, which can propagate and accumulate through subsequent calculations. Such errors are particularly pronounced in iterative processes and when subtracting nearly equal values - is told [[Catastrophic cancellation\|loss of significance]]. To mitigate these errors, techniques such as the use of [[Guard digit\|guard digits]] or [[higher precision arithmetic]] are employed.<ref>{{Cite journal \|last=L. Popyack \|first=Jeffrey \|date=June 2000 \|title=Rounding Error \|url=https://www.cs.drexel.edu/~popyack/Courses/CSP/Fa17/extras/Rounding/index.html \|format= \|journal=[[Drexel University]] \|language=English}}</ref><ref>{{Cite journal \|date=8 February 2021 \|title=9. Machine Numbers, Rounding Error and Error Propagation \|url=https://lemesurierb.people.charleston.edu/elementary-numerical-analysis-python/notebooks/machine-numbers-rounding-error-and-error-propagation-python.html \|journal=[[College of Charleston]]}}</ref> {{further \| Floating point}} == See also == * [[Galley division]] * [[Multiplication algorithm]] * [[Pentium FDIV bug]] == Notes == {{notelist}} == References == {{reflist}} == Further reading == * {{cite web \|title=Advanced Arithmetic Techniques \|author-first=John J. G. \|author-last=Savard \|date=2018 \|orig-year=2006 \|work=quadibloc \|url=http://www.quadibloc.com/comp/cp0202.htm \|access-date=2018-07-16 \|url-status=live \|archive-url=https://web.archive.org/web/20180703001722/http://www.quadibloc.com/comp/cp0202.htm \|archive-date=2018-07-03}}