Content deleted Content added
HaydenWong (talk | contribs) |
→Newton–Raphson division: Split up "Variant Newton–Raphson division" subsection. The auadratic initial estimate material was moved up to join the linar estimate text in a new "Initial estimate" subsection. The remaining discusson of a cubic iteration is retitled "Cubic iteration." Rewrite "number of iterations to get P bits" equation to "number of bits after S iterations". FIXME: Is the cubic iteration exactly cubic, or is there a scaling term? |
||
Line 231:
This squaring of the error at each iteration step{{snd}} the so-called [[Newton's method#Practical considerations|quadratic convergence]] of Newton–Raphson's method{{snd}} has the effect that the number of correct digits in the result roughly ''doubles for every iteration'', a property that becomes extremely valuable when the numbers involved have many digits (e.g. in the large integer ___domain). But it also means that the initial convergence of the method can be comparatively slow, especially if the initial estimate <math>X_0</math> is poorly chosen.
====Initial estimate====
For the subproblem of choosing an initial estimate <math>X_0</math>, it is convenient to apply a bit-shift to the divisor ''D'' to scale it so that 0.5 ≤ ''D'' ≤ 1
:<math>X_0 = {48 \over 17} - {32 \over 17} D.
The coefficients of the linear approximation <math>T_0 +T_1 D</math> are determined as follows. The absolute value of the error is <math>|\varepsilon_0| = |1 - D(
:<math>\vert \varepsilon_0 \vert \leq {1 \over 17} \approx 0.059 .
It is possible to generate a polynomial fit of degree larger than 1, computing the coefficients using the [[Remez algorithm]]. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson.▼
The best quadratic fit to <math>1/D</math> in the interval is
Since for this method the [[rate of convergence|convergence]] is exactly quadratic, it follows that▼
:<math> X := \frac{140}{33} + D \cdot \left(\frac{-64}{11} + D \cdot \frac{256}{99}\right) .</math>▼
It is chosen to make the error equal to a re-scaled third order [[Chebyshev polynomial]] of the first kind, and gives an absolute value of the error less than or equal to 1/99. This improvement is equivalent to <math>\log_2(\log 99/\log 17) \approx 0.7</math> Newton–Raphson iterations, at a computational cost of less than one iteration.
▲It is possible to generate a polynomial fit of degree larger than
▲Since for this method the [[rate of convergence|convergence]] is exactly quadratic, it follows that, from an initial error <math>\epsilon_0</math>, <math>S</math> iterations will give an answer accurate to
:<math>P = -2^S \log_2 \epsilon_0 - 1 = 2^S \log_2(1/\epsilon_0) - 1</math>
binary places. Typical values are:
{|class=wikitable style="text-align:right;"
|+ Binary digits of reciprocal accuracy
!rowspan=2| <math>\epsilon_0</math> ||colspan=5| Iterations
|-
! 0 || 1 || 2 || 3 || 4
|-
! <math>1/17</math>
| {{round|{{#expr: (ln17/ln2) - 1}}|2}}
| {{round|{{#expr: 2*(ln17/ln2) - 1}}|2}}
| {{round|{{#expr: 4*(ln17/ln2) - 1}}|2}}
| {{round|{{#expr: 8*(ln17/ln2) - 1}}|2}}
| {{round|{{#expr:16*(ln17/ln2) - 1}}|2}}
|-
! <math>1/99</math>
| {{round|{{#expr: (ln99/ln2) - 1}}|2}}
| {{round|{{#expr: 2*(ln99/ln2) - 1}}|2}}
| {{round|{{#expr: 4*(ln99/ln2) - 1}}|2}}
| {{round|{{#expr: 8*(ln99/ln2) - 1}}|2}}
| {{round|{{#expr:16*(ln99/ln2) - 1}}|2}}
|}
A quadratic initial estimate plus two iterations is accurate enough for IEEE [[single precision]], but three iterations are marginal for [[double precision]]. A linear initial estimate plus four iterations is sufficient for both double and [[extended precision|double extended]] formats.
====Pseudocode====
Line 260 ⟶ 284:
For example, for a double-precision floating-point division, this method uses 10 multiplies, 9 adds, and 2 shifts.
====
There is an iteration which uses three multiplications to cube the error:
▲:<math> X := \frac{140}{33} + D \cdot \left(\frac{-64}{11} + D \cdot \frac{256}{99}\right) .</math>
:<math> E := 1 - D \cdot X </math>
:<math> Y := X \cdot E </math>
Line 271 ⟶ 291:
The ''Y''⋅''E'' term is new.
Because <math>\log 3 / \log 2 \approx 1.585 > 3/2</math>, this converges slightly faster per multiplication.
:<math>P
binary places. Typical values are:
{|class=wikitable style="text-align:right;"
|+ Bits of reciprocal accuracy
!rowspan=2| <math>\epsilon_0</math> ||colspan=4| Iterations
|-
! 0 || 1 || 2 || 3
|-
! <math>1/17</math>
| {{round|{{#expr: (ln17/ln2) - 1}}|2}}
| {{round|{{#expr: 3*(ln17/ln2) - 1}}|2}}
| {{round|{{#expr: 9*(ln17/ln2) - 1}}|2}}
| {{round|{{#expr:27*(ln17/ln2) - 1}}|2}}
|-
! <math>1/99</math>
| {{round|{{#expr: (ln99/ln2) - 1}}|2}}
| {{round|{{#expr: 3*(ln99/ln2) - 1}}|2}}
| {{round|{{#expr: 9*(ln99/ln2) - 1}}|2}}
| {{round|{{#expr:27*(ln99/ln2) - 1}}|2}}
|}
It is also possible to use a mixture of quadratic and cubic iterations.
Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations.{{cn|date=January 2025}}
==={{anchor|AEGP}}Goldschmidt division===
|