Jenkins–Traub algorithm: Difference between revisions

Content deleted Content added
No edit summary
The Golden Ratio is about 1.618033… so if you round it to three decimals, it would be ≈1.62 and not ≈1.61. A better fix is to say ≈1.618 since that is an excellent approximation, correctly rounded.
 
(8 intermediate revisions by 7 users not shown)
Line 1:
{{Short description|Root-finding algorithm for polynomials}}
The '''Jenkins–Traub algorithm for polynomial zeros''' is a fast globally convergent iterative [[Root-finding algorithms#Roots of polynomials|polynomial root-finding]] method published in 1970 by [[Michael A. Jenkins]] and [[Joseph F. Traub]]. They gave two variants, one for general polynomials with complex coefficients, commonly known as the "CPOLY" algorithm, and a more complicated variant for the special case of polynomials with real coefficients, commonly known as the "RPOLY" algorithm. The latter is "practically a standard in black-box polynomial root-finders".<ref>Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007), Numerical Recipes: The Art of Scientific Computing, 3rd ed., Cambridge University Press, page 470.</ref>
 
Line 15 ⟶ 16:
=== Root-finding procedure ===
 
Starting with the current polynomial ''P''(''X'') of degree ''n'', the aim is to compute the smallest root <math>\alpha</math> of ''P(x)'' is computed. ToThe thatpolynomial end,can athen sequencebe ofsplit so-calledinto ''H''a polynomialslinear is constructed. These polynomials are all of degree ''n''&nbsp;&minus;&nbsp;1factor and arethe supposedremaining to converge to thepolynomial factor of<math ''display="block>P''(''X'')=(X-\alpha)\bar containing all the remaining roots. The sequence of ''H''(X)</math> polynomialsOther occursroot-finding inmethods twostrive variants,primarily anto unnormalizedimprove variantthe thatroot allowsand easythus theoreticalthe insightsfirst andfactor. aThe normalizedmain variantidea of <math>\barthe H</math>Jenkins-Traub polynomialsmethod thatis keepsto theincrementally coefficientsimprove inthe asecond numerically sensible rangefactor.
 
To that end, a sequence of so-called ''H'' polynomials is constructed. These polynomials are all of degree ''n''&nbsp;&minus;&nbsp;1 and are supposed to converge to the factor <math>\bar H(X)</math> of ''P''(''X'') containing (the linear factors of) all the remaining roots. The sequence of ''H'' polynomials occurs in two variants, an unnormalized variant that allows easy theoretical insights and a normalized variant of <math>\bar H</math> polynomials that keeps the coefficients in a numerically sensible range.
The construction of the ''H'' polynomials <math>\left(H^{(\lambda)}(z)\right)_{\lambda=0,1,2,\dots}</math> dependsis onguided by a sequence of [[complex numbersnumber]]s <math>(s_\lambda)_{\lambda=0,1,2,\dots}</math> called shifts. These shifts themselves depend, at least in the third stage, on the previous ''H'' polynomials. The ''H'' polynomials are defined as the solution to the implicit recursion
<math display="block">
H^{(0)}(z)=P^\prime(z)
Line 33 ⟶ 35:
where the polynomial division is exact.
 
Algorithmically, one would use forlong instancedivision by the linear factor as in the [[Horner scheme]] or [[Ruffini rule]] to evaluate the polynomials at <math>s_\lambda</math> and obtain the quotients at the same time. With the resulting quotients ''p''(''X'') and ''h''(''X'') as intermediate results the next ''H'' polynomial is obtained as
<math display="block">
\left.\begin{align}
Line 55 ⟶ 57:
 
==== Stage one: no-shift process ====
 
For <math>\lambda = 0,1,\dots, M-1</math> set <math>s_\lambda=0</math>. Usually ''M=5'' is chosen for polynomials of moderate degrees up to ''n''&nbsp;=&nbsp;50. This stage is not necessary from theoretical considerations alone, but is useful in practice. It emphasizes in the ''H'' polynomials the cofactor(s) (of the linear factor) of the smallest root(s).
 
==== Stage two: fixed-shift process ====
 
The shift for this stage is determined as some point close to the smallest root of the polynomial. It is quasi-randomly located on the circle with the inner root radius, which in turn is estimated as the positive solution of the equation
<math display="block">
Line 64 ⟶ 68:
Since the left side is a convex function and increases monotonically from zero to infinity, this equation is easy to solve, for instance by [[Newton's method]].
 
Now choose <math>s=R\cdot \exp(i\,\phi_\text{random})</math> on the circle of this radius. The sequence of polynomials <math>H^{(\lambda+1)}(z)</math>, <math>\lambda=M,M+1,\dots,L-1</math>, is generated with the fixed shift value <math>s_\lambda = s</math>. This creates an asymmetry relative to the previous stage which increases the chance that the ''H'' polynomial moves towards the cofactor of a single root.
During this iteration, the current approximation for the root
 
<math display="block">t_\lambda=s-\frac{P(s)}{\bar H^{(\lambda)}(s)}</math>
is traced. The second stage is finishedterminated successfullyas successful if the conditions
<math display="block">
|t_{\lambda+1}-t_\lambda|<\tfrac12\,|t_\lambda|
Line 72 ⟶ 78:
|t_\lambda-t_{\lambda-1}|<\tfrac12\,|t_{\lambda-1}|
</math>
are simultaneously met. This limits the relative step size of the iteration, ensuring that the approximation sequence stays in the range of the smaller roots. If there was no success after some number of iterations, a different random point on the circle is tried. Typically one uses a number of 9 iterations for polynomials of moderate degree, with a doubling strategy for the case of multiple failures.
 
==== Stage three: variable-shift process ====
The <math>H^{(\lambda+1)}(X)</math> polynomials are now generated using the variable shifts <math>s_{\lambda},\quad\lambda=L,L+1,\dots</math> which are generated by
<math display="block">s_L = t_L = s- \frac{P(s)}{\bar H^{(\lambdaL)}(s)}</math>
being the last root estimate of the second stage and
<math display="block">s_{\lambda+1}=s_\lambda- \frac{P(s_\lambda)}{\bar H^{(\lambda+1)}(s_\lambda)}, \quad \lambda=L,L+1,\dots,</math>
Line 86 ⟶ 92:
It can be shown that, provided ''L'' is chosen sufficiently large, ''s''<sub>λ</sub> always converges to a root of ''P''.
 
The algorithm converges for any distribution of roots, but may fail to find all roots of the polynomial. Furthermore, the convergence is slightly faster than the [[Rate of convergence|quadratic convergence]] of the Newton–Raphson iterationmethod, however, it uses at least twiceone-and-half as many operations per step, two polynomial evaluations for Newton vs. three polynomial evaluations in the third stage.
 
==What gives the algorithm its power?==
Line 178 ⟶ 184:
\frac{|\alpha_1-s_\lambda|^2}{|\alpha_2-s_\lambda|}
\right)
</math> giving rise to a higher than quadratic convergence order of <math>\phi^2=1+\phi\approx 2.61618</math>, where <math>\phi=\tfrac12(1+\sqrt5)</math> is the [[golden ratio]].
 
=== Interpretation as inverse power iteration ===
Line 184 ⟶ 190:
<math display="block">P(X)=(X-\alpha_1)\cdot P_1(X)</math>
with a root <math>\alpha_1\in\C</math> and <math>P_1(X) = P(X) / (X-\alpha_1)</math> the remaining factor of degree ''n''&nbsp;&minus;&nbsp;1 as the eigenvector equation for the multiplication with the variable ''X'', followed by remainder computation with divisor ''P''(''X''),
<math display="block">M_X(H) = (X\cdot H(X)) \bmod P(X)\,.</math>
This maps polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1 to polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1. The eigenvalues of this map are the roots of ''P''(''X''), since the eigenvector equation reads
<math display="block">0 = (M_X-\alpha\cdot id)(H)=((X-\alpha)\cdot H) \bmod P\,,</math>
which implies that <math>(X-\alpha)\cdot H)=C\cdot P(X)</math>, that is, <math>(X-\alpha)</math> is a linear factor of ''P''(''X''). In the monomial basis the linear map <math>M_X</math> is represented by a [[companion matrix]] of the polynomial ''P'', as
<math display="block"> M_X(H) = \sum_{m=0}^{n-1}H_mX^{m+1}-H_{n-1}\left(X^n+\sum_{m=0}^{n-1}a_mX^m\right) = \sum_{m=1}^{n-1}(H_{m-1}-P_a_{m}H_{n-1})X^m-P_0H_a_0H_{n-1}\,,</math>
the resulting coefficienttransformation matrix is
<math display="block">A=\begin{pmatrix}
0 & 0 & \dots & 0 & -P_0a_0 \\
1 & 0 & \dots & 0 & -P_1a_1 \\
0 & 1 & \dots & 0 & -P_2a_2 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & \dots & 1 & -P_a_{n-1}
\end{pmatrix}\,.</math>
To this matrix the [[inverse power iteration]] is applied in the three variants of no shift, constant shift and generalized Rayleigh shift in the three stages of the algorithm. It is more efficient to perform the linear algebra operations in polynomial arithmetic and not by matrix operations, however, the properties of the inverse power iteration remain the same.
Line 210 ⟶ 216:
The methods have been extensively tested by many people.{{Who|date=December 2021}} As predicted they enjoy faster than quadratic convergence for all distributions of zeros.
 
However, there are polynomials which can cause loss of precision<ref>{{Cite web|date=8 August 2005|title=William Kahan Oral history interview by Thomas Haigh|url=http://history.siam.org/oralhistories/kahan.htm|url-status=live|access-date=2021-12-03|website=The History of Numerical Analysis and Scientific Computing|publication-place=Philadelphia, PA}}</ref> as illustrated by the following example. The polynomial has all its zeros lying on two half-circles of different radii. [[James H. Wilkinson|Wilkinson]] recommends that it is desirable for stable deflation that smaller zeros be computed first. The second-stage shifts are chosen so that the zeros on the smaller half circle are found first. After deflation the polynomial with the zeros on the half circle is known to be ill-conditioned if the degree is large; see Wilkinson,<ref>Wilkinson, J. H. (1963), Rounding Errors in Algebraic Processes, Prentice Hall, Englewood Cliffs, N.J.</ref> p.&nbsp;64. The original polynomial was of degree 60 and suffered severe deflation instability.
 
==References==
Line 223 ⟶ 229:
{{DEFAULTSORT:Jenkins-Traub Algorithm}}
[[Category:Numerical analysis]]
[[Category:Root-findingPolynomial factorization algorithms]]