Jenkins–Traub algorithm: Difference between revisions

Content deleted Content added
Software and testing: add reference to William Kahan oral history & add tag to weasel words
No edit summary
Line 2:
 
This article describes the complex variant. Given a polynomial ''P'',
:<math display="block">P(z) = \sum_{i=0}^na_iz^{n-i}, \quad a_0=1,\quad a_n\ne 0</math>
 
:<math>P(z)=\sum_{i=0}^na_iz^{n-i}, \quad a_0=1,\quad a_n\ne 0</math>
 
with complex coefficients it computes approximations to the ''n'' zeros <math>\alpha_1,\alpha_2,\dots,\alpha_n</math> of ''P''(''z''), one at a time in roughly increasing order of magnitude. After each root is computed, its linear factor is removed from the polynomial. Using this ''deflation'' guarantees that each root is computed only once and that all roots are found.
 
Line 20 ⟶ 18:
 
The construction of the ''H'' polynomials <math>\left(H^{(\lambda)}(z)\right)_{\lambda=0,1,2,\dots}</math> depends on a sequence of complex numbers <math>(s_\lambda)_{\lambda=0,1,2,\dots}</math> called shifts. These shifts themselves depend, at least in the third stage, on the previous ''H'' polynomials. The ''H'' polynomials are defined as the solution to the implicit recursion
:<math display="block">
H^{(0)}(z)=P^\prime(z)
</math> and <math display="block">
(X-s_\lambda)\cdot H^{(\lambda+1)}(X)\equiv H^{(\lambda)}(X)\pmod{P(X)}\ .
</math>
A direct solution to this implicit equation is
:<math display="block">
H^{(\lambda+1)}(X)
=\frac1{X-s_\lambda}\cdot
Line 36 ⟶ 34:
 
Algorithmically, one would use for instance the [[Horner scheme]] or [[Ruffini rule]] to evaluate the polynomials at <math>s_\lambda</math> and obtain the quotients at the same time. With the resulting quotients ''p''(''X'') and ''h''(''X'') as intermediate results the next ''H'' polynomial is obtained as
:<math display="block">
\left.\begin{align}
P(X)&=p(X)\cdot(X-s_\lambda)+P(s_\lambda)\\
Line 44 ⟶ 42:
</math>
Since the highest degree coefficient is obtained from ''P(X)'', the leading coefficient of <math>H^{(\lambda+1)}(X)</math> is <math>-\tfrac{H^{(\lambda)}(s_\lambda)}{P(s_\lambda)}</math>. If this is divided out the normalized ''H'' polynomial is
:<math display="block">\begin{align}
\bar H^{(\lambda+1)}(X)
&=\frac1{X-s_\lambda}\cdot
Line 57 ⟶ 55:
 
==== Stage one: no-shift process ====
For <math>\lambda = 0,1,\dots, M-1</math> set <math>s_\lambda=0</math>. Usually ''M=5'' is chosen for polynomials of moderate degrees up to ''n''&nbsp;=&nbsp;50. This stage is not necessary from theoretical considerations alone, but is useful in practice. It emphasizes in the ''H'' polynomials the cofactor (of the linear factor) of the smallest root.
 
==== Stage two: fixed-shift process ====
The shift for this stage is determined as some point close to the smallest root of the polynomial. It is quasi-randomly located on the circle with the inner root radius, which in turn is estimated as the positive solution of the equation
:<math display="block">
R^n+|a_{n-1}|\,R^{n-1}+\dots+|a_{1}|\,R=|a_0|\,.
</math>
Since the left side is a convex function and increases monotonically from zero to infinity, this equation is easy to solve, for instance by [[Newton's method]].
 
Now choose <math>s=R\cdot \exp(i\,\phi_\text{random})</math> on the circle of this radius. The sequence of polynomials <math>H^{(\lambda+1)}(z)</math>, <math>\lambda=M,M+1,\dots,L-1</math>, is generated with the fixed shift value <math>s_\lambda = s</math>. During this iteration, the current approximation for the root
:<math display="block">t_\lambda=s-\frac{P(s)}{\bar H^{(\lambda)}(s)}</math>
is traced. The second stage is finished successfully if the conditions
:<math display="block">
|t_{\lambda+1}-t_\lambda|<\tfrac12\,|t_\lambda|
</math> and <math display="block">
|t_\lambda-t_{\lambda-1}|<\tfrac12\,|t_{\lambda-1}|
</math>
Line 78 ⟶ 76:
==== Stage three: variable-shift process ====
The <math>H^{(\lambda+1)}(X)</math> are now generated using the variable shifts <math>s_{\lambda},\quad\lambda=L,L+1,\dots</math> which are generated by
:<math display="block">s_L = t_L = s- \frac{P(s)}{\bar H^{(\lambda)}(s)}</math>
being the last root estimate of the second stage and
:<math display="block">s_{\lambda+1}=s_\lambda- \frac{P(s_\lambda)}{\bar H^{(\lambda+1)}(s_\lambda)}, \quad \lambda=L,L+1,\dots,</math>
:where <math>\bar H^{(\lambda+1)}(z)</math> is the normalized ''H'' polynomial, that is <math>H^{(\lambda)}(z)</math> divided by its leading coefficient.
 
If the step size in stage three does not fall fast enough to zero, then stage two is restarted using a different random point. If this does not succeed after a small number of restarts, the number of steps in stage two is doubled.
Line 92 ⟶ 90:
==What gives the algorithm its power?==
Compare with the [[Newton–Raphson iteration]]
:<math display="block">z_{i+1}=z_i - \frac{P(z_i)}{P^{\prime}(z_i)}.</math>
 
:<math>z_{i+1}=z_i - \frac{P(z_i)}{P^{\prime}(z_i)}.</math>
 
The iteration uses the given ''P'' and <math>\scriptstyle P^{\prime}</math>. In contrast the third-stage of Jenkins–Traub
<math display="block">
 
:<math>
s_{\lambda+1}
=s_\lambda- \frac{P(s_\lambda)}{\bar H^{\lambda+1}(s_\lambda)}
Line 104 ⟶ 100:
 
is precisely a Newton–Raphson iteration performed on certain [[rational functions]]. More precisely, Newton–Raphson is being performed on a sequence of rational functions
:<math display="block">W^\lambda(z)=\frac{P(z)}{H^\lambda(z)}.</math>
 
:<math>W^\lambda(z)=\frac{P(z)}{H^\lambda(z)}.</math>
 
For <math>\lambda</math> sufficiently large,
:<math display="block">\frac{P(z)}{\bar H^{\lambda}(z)}=W^\lambda(z)\,LC(H^{\lambda})</math>
 
:<math>\frac{P(z)}{\bar H^{\lambda}(z)}=W^\lambda(z)\,LC(H^{\lambda})</math>
 
is as close as desired to a first degree polynomial
:<math display="block">z-\alpha_1, \,</math>
 
:<math>z-\alpha_1, \,</math>
 
where <math>\alpha_1</math> is one of the zeros of <math>P</math>. Even though Stage 3 is precisely a Newton–Raphson iteration, differentiation is not performed.
 
=== Analysis of the ''H'' polynomials ===
Let <math>\alpha_1,\dots,\alpha_n</math> be the roots of ''P''(''X''). The so-called Lagrange factors of ''P(X)'' are the cofactors of these roots,
:<math display="block">P_m(X)=\frac{P(X)-P(\alpha_m)}{X-\alpha_m}.</math>
If all roots are different, then the Lagrange factors form a basis of the space of polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1. By analysis of the recursion procedure one finds that the ''H'' polynomials have the coordinate representation
:<math display="block">
H^{(\lambda)}(X)
=\sum_{m=1}^n
Line 129 ⟶ 120:
</math>
Each Lagrange factor has leading coefficient 1, so that the leading coefficient of the H polynomials is the sum of the coefficients. The normalized H polynomials are thus
:<math display="block">
\bar H^{(\lambda)}(X)
=\frac{\sum_{m=1}^n
Line 141 ⟶ 132:
\right]^{-1}
}
= \frac{P_1(X)+\sum_{m=2}^n
\left[
\prod_{\kappa=0}^{\lambda-1}\frac{\alpha_1-s_\kappa}{\alpha_m-s_\kappa}
Line 157 ⟶ 148:
 
Under the condition that
:<math display="block">|\alpha_1|<|\alpha_2|=\min{}_{m=2,3,\dots,n}|\alpha_m|</math>
one gets the asymptotic estimates for
*stage 1: <math display="block">
*:<math>
H^{(\lambda)}(X)
=P_1(X)+O\left(\left|\frac{\alpha_1}{\alpha_2}\right|^\lambda\right).
</math>
*for stage 2, if ''s'' is close enough to <math>\alpha_1</math>: <math display="block">
*:<math>
H^{(\lambda)}(X)
= P_1(X)
+O\left(
\left|\frac{\alpha_1}{\alpha_2}\right|^M
\cdot
\left|\frac{\alpha_1-s}{\alpha_2-s}\right|^{\lambda-M}\right)
</math> and <math display="block">
*:and
*:<math>
s-\frac{P(s)}{\bar H^{(\lambda)}(s)}
= \alpha_1+O\left(\ldots\cdot|\alpha_1-s|\right).</math>
*and for stage 3: <math display="block">
*:<math>
H^{(\lambda)}(X)
=P_1(X)
Line 184 ⟶ 170:
\left|\frac{\alpha_1-s_\kappa}{\alpha_2-s_\kappa}\right|
\right)
</math> and <math display="block">
*:and
*:<math>
s_{\lambda+1}=
s_\lambda-\frac{P(s)}{\bar H^{(\lambda+1)}(s_\lambda)}
Line 194 ⟶ 178:
\frac{|\alpha_1-s_\lambda|^2}{|\alpha_2-s_\lambda|}
\right)
:</math> giving rise to a higher than quadratic convergence order of <math>\phi^2=1+\phi\approx 2.61</math>, where <math>\phi=\tfrac12(1+\sqrt5)</math> is the [[golden ratio]].
</math>
:giving rise to a higher than quadratic convergence order of <math>\phi^2=1+\phi\approx 2.61</math>, where <math>\phi=\tfrac12(1+\sqrt5)</math> is the [[golden ratio]].
 
=== Interpretation as inverse power iteration ===
All stages of the Jenkins–Traub complex algorithm may be represented as the linear algebra problem of determining the eigenvalues of a special matrix. This matrix is the coordinate representation of a linear map in the ''n''-dimensional space of polynomials of degree ''n''&nbsp;&minus;&nbsp;1 or less. The principal idea of this map is to interpret the factorization
:<math display="block">P(X)=(X-\alpha_1)\cdot P_1(X)</math>
with a root <math>\alpha_1\in\C</math> and <math>P_1(X) = P(X) / (X-\alpha_1)</math> the remaining factor of degree ''n''&nbsp;&minus;&nbsp;1 as the eigenvector equation for the multiplication with the variable ''X'', followed by remainder computation with divisor ''P''(''X''),
:<math display="block">M_X(H)=(X\cdot H(X)) \bmod P(X)\,.</math>
This maps polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1 to polynomials of degree at most ''n''&nbsp;&minus;&nbsp;1. The eigenvalues of this map are the roots of ''P''(''X''), since the eigenvector equation reads
:<math display="block">0 = (M_X-\alpha\cdot id)(H)=((X-\alpha)\cdot H) \bmod P\,,</math>
which implies that <math>(X-\alpha)\cdot H)=C\cdot P(X)</math>, that is, <math>(X-\alpha)</math> is a linear factor of ''P''(''X''). In the monomial basis the linear map <math>M_X</math> is represented by a [[companion matrix]] of the polynomial ''P'', as
:<math display="block"> M_X(H)=\sum_{m=1}^{n-1}(H_{m-1}-P_{m}H_{n-1})X^m-P_0H_{n-1}\,,</math>
the resulting coefficient matrix is
:<math display="block">A=\begin{pmatrix}
0 & 0 & \dots & 0 & -P_0 \\
1 & 0 & \dots & 0 & -P_1 \\