Revision as of 10:57, 9 January 2025 edit HaydenWong (talk \| contribs) Extended confirmed users 1,110 edits →Non-restoring division Tag: 2017 wikitext editor ← Previous edit		Revision as of 17:41, 19 January 2025 edit undo 97.102.205.224 (talk) →Newton–Raphson division: Split up "Variant Newton–Raphson division" subsection. The auadratic initial estimate material was moved up to join the linar estimate text in a new "Initial estimate" subsection. The remaining discusson of a cubic iteration is retitled "Cubic iteration." Rewrite "number of iterations to get P bits" equation to "number of bits after S iterations". FIXME: Is the cubic iteration exactly cubic, or is there a scaling term? Next edit →
Line 231: This squaring of the error at each iteration step{{snd}} the so-called [[Newton's method#Practical considerations\|quadratic convergence]] of Newton–Raphson's method{{snd}} has the effect that the number of correct digits in the result roughly ''doubles for every iteration'', a property that becomes extremely valuable when the numbers involved have many digits (e.g. in the large integer ___domain). But it also means that the initial convergence of the method can be comparatively slow, especially if the initial estimate <math>X_0</math> is poorly chosen. ====Initial estimate==== For the subproblem of choosing an initial estimate <math>X_0</math>, it is convenient to apply a bit-shift to the divisor ''D'' to scale it so that 0.5 ≤ ''D'' ≤ 1~~; by~~. ~~applying~~Applying the same bit-shift to the numerator ''N''~~, one~~ ensures the quotient does not change. ~~Then~~ ~~one~~Once ~~could~~within ~~use~~a bounded range, a ~~linear~~simple polynomial [[approximation]] incan ~~the~~be ~~form~~used to find an initial estimate. ~~:<math>X_0 = T_1 + T_2 D \approx \frac{1}{D} \,</math>~~ toThe ~~initialize~~linear ~~Newton–Raphson.~~[[approximation]] Towith ~~minimize~~mimimum ~~the maximum of the~~worst-case absolute ~~value of the~~ error ofon ~~this~~interval ~~approximation on~~the interval <math>[0.5,1]</math>~~, one should~~ ~~use~~is: :<math>X_0 = {48 \over 17} - {32 \over 17} D. \,</math> The coefficients of the linear approximation <math>T_0 +T_1 D</math> are determined as follows. The absolute value of the error is <math>\|\varepsilon_0\| = \|1 - D(~~T_1~~T_0 + ~~T_2~~T_1 D)\|</math>. The minimum of the maximum absolute value of the error is determined by the [[Equioscillation theorem\|Chebyshev equioscillation theorem]] applied to <math>F(D) = 1 - D(~~T_1~~T_0 + ~~T_2~~T_1 D)</math>. The local minimum of <math>F(D)</math> occurs when <math>F'(D) = 0</math>, which has solution <math>D = -~~T_1~~T_0/(~~2T_2~~2T_1)</math>. The function at that minimum must be of opposite sign as the function at the endpoints, namely, <math>F(1/2) = F(1) = -F(-~~T_1~~T_0/(~~2T_2~~2T_1))</math>. The two equations in the two unknowns have a unique solution <math>~~T_1~~T_0 = 48/17</math> and <math>~~T_2~~T_1 = -32/17</math>, and the maximum error is <math>F(1) = 1/17</math>. Using this approximation, the absolute value of the error of the initial value is less than :<math>\vert \varepsilon_0 \vert \leq {1 \over 17} \approx 0.059 . \,</math> It is possible to generate a polynomial fit of degree larger than 1, computing the coefficients using the [[Remez algorithm]]. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson.▼ The best quadratic fit to <math>1/D</math> in the interval is Since for this method the [[rate of convergence\|convergence]] is exactly quadratic, it follows that▼ :<math> X := \frac{140}{33} + D \cdot \left(\frac{-64}{11} + D \cdot \frac{256}{99}\right) .</math>▼ It is chosen to make the error equal to a re-scaled third order [[Chebyshev polynomial]] of the first kind, and gives an absolute value of the error less than or equal to 1/99. This improvement is equivalent to <math>\log_2(\log 99/\log 17) \approx 0.7</math> Newton–Raphson iterations, at a computational cost of less than one iteration. ▲It is possible to generate a polynomial fit of degree larger than 12, computing the coefficients using the [[Remez algorithm]]. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson. ~~:<math>S = \left \lceil \log_2 \frac{P + 1}{\log_2 17} \right \rceil \,</math>~~ ▲Since for this method the [[rate of convergence\|convergence]] is exactly quadratic, it follows that, from an initial error <math>\epsilon_0</math>, <math>S</math> iterations will give an answer accurate to steps are enough to calculate the value up to <math>P \,</math> binary places. This evaluates to 3 for IEEE [[single precision]] and 4 for both [[double precision]] and [[extended precision\|double extended]] formats. :<math>P = -2^S \log_2 \epsilon_0 - 1 = 2^S \log_2(1/\epsilon_0) - 1</math> binary places. Typical values are: {\|class=wikitable style="text-align:right;" \|+ Binary digits of reciprocal accuracy !rowspan=2\| <math>\epsilon_0</math> \|\|colspan=5\| Iterations \|- ! 0 \|\| 1 \|\| 2 \|\| 3 \|\| 4 \|- ! <math>1/17</math> \| {{round\|{{#expr: (ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 2(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 4(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 8(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr:16(ln17/ln2) - 1}}\|2}} \|- ! <math>1/99</math> \| {{round\|{{#expr: (ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 2(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 4(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 8(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr:16(ln99/ln2) - 1}}\|2}} \|} A quadratic initial estimate plus two iterations is accurate enough for IEEE [[single precision]], but three iterations are marginal for [[double precision]]. A linear initial estimate plus four iterations is sufficient for both double and [[extended precision\|double extended]] formats. ====Pseudocode==== Line 260 ⟶ 284: For example, for a double-precision floating-point division, this method uses 10 multiplies, 9 adds, and 2 shifts. ====~~Variant~~Cubic ~~Newton–Raphson division~~iteration==== There is an iteration which uses three multiplications to cube the error: ~~The Newton-Raphson division method can be modified to be slightly faster as follows. After shifting ''N'' and ''D'' so that ''D'' is in [0.5, 1.0], initialize with~~ ▲:<math> X := \frac{140}{33} + D \cdot \left(\frac{-64}{11} + D \cdot \frac{256}{99}\right) .</math> This is the best quadratic fit to 1/''D'' and gives an absolute value of the error less than or equal to 1/99. It is chosen to make the error equal to a re-scaled third order [[Chebyshev polynomial]] of the first kind. The coefficients should be pre-calculated and hard-coded. ~~Then in the loop, use an iteration which cubes the error.~~ :<math> E := 1 - D \cdot X </math> :<math> Y := X \cdot E </math> Line 271 ⟶ 291: The ''Y''⋅''E'' term is new. Because <math>\log 3 / \log 2 \approx 1.585 > 3/2</math>, this converges slightly faster per multiplication. ~~If the loop is performed until X agrees with 1/''D'' on its leading ''P'' bits, then the number of iterations will be no more than~~ ~~:<math> \left \lceil \log_3 \left(\frac{P + 1}{\log_2 99} \right) \right \rceil </math>~~ ~~which is the~~The number of ~~times~~correct 99bits ~~must~~after ~~be cubed to get to 2~~<~~sup~~math>~~''P''+1~~S</~~sup~~math>. ~~Then~~iterations is :<math>P Q:= N-3^S \~~cdot~~log_2 X\epsilon_0 - 1 = 3^S \log_2(1/\epsilon_0) - 1</math> binary places. Typical values are: ~~is the quotient to ''P'' bits.~~ {\|class=wikitable style="text-align:right;" \|+ Bits of reciprocal accuracy !rowspan=2\| <math>\epsilon_0</math> \|\|colspan=4\| Iterations \|- ! 0 \|\| 1 \|\| 2 \|\| 3 \|- ! <math>1/17</math> \| {{round\|{{#expr: (ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 3(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr: 9(ln17/ln2) - 1}}\|2}} \| {{round\|{{#expr:27(ln17/ln2) - 1}}\|2}} \|- ! <math>1/99</math> \| {{round\|{{#expr: (ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 3(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr: 9(ln99/ln2) - 1}}\|2}} \| {{round\|{{#expr:27(ln99/ln2) - 1}}\|2}} \|} It is also possible to use a mixture of quadratic and cubic iterations. Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations.{{cn\|date=January 2025}} ==={{anchor\|AEGP}}Goldschmidt division===

Division algorithm: Difference between revisions