Revision as of 04:53, 1 January 2025 edit 12.75.128.58 (talk) corrected false comment about Newton's method (prior editor didn't figure in the derivative calculation); moved into footnote ← Previous edit		Revision as of 23:15, 3 February 2025 edit undo LucasBrown (talk \| contribs) Extended confirmed users 63,051 edits m MOS:", spacing, equation numbering, and minor grammar tweaks Tag: 2017 wikitext editor Next edit →
Line 9: ==Simple description== The simplest form of the formula for Steffensen's method occurs when it is used to find a [[zero of a function\|zero]] of a [[real function]] <math>\ f~~\ ;~~</math>; that is, to find the real value <math>x_\star</math> that satisfies <math>\ f(x_\star) = 0 ~.</math>. Near the solution <math>\ x_\star~~\, ,~~</math>, the derivative of the function, <math>\ f'~~\ ,~~</math>, is supposed to approximately satisfy <math>\ -1 < f'(x_\star) < 0~~\ ;~~</math>; this condition ensures that <math>\ f\ </math> is an adequate correction-function for <math>\ x~~\ ,~~</math>, for finding its ''own'' solution, although it is not required to work efficiently. For some functions, Steffensen's method can work even if this condition is not met, but in such a case, the starting value <math>\ x_0\ </math> must be ''very'' close to the actual solution <math>\ x_\star~~\ ,~~</math>, and convergence to the solution may be slow. Adjustment of the size of the method's intermediate step, mentioned later, can improve convergence in some of these cases. Given an adequate starting value <math>\ x_0~~\, ,~~</math>, a sequence of values <math>\ x_0,\ x_1,\ x_2,\ \dots,\ x_n,\ \dots\ </math> can be generated using the formula below. When it works, each value in the sequence is much closer to the solution <math>\ x_\star\ </math> than the prior value. The value <math>\ x_n\ </math> from the current step generates the value <math>\ x_{n+1}\ </math> for the next step, via ~~this~~the formula:<ref name=Dahlquist-Björck-1974/> :<math>\ x_{n+1} = x_n - \frac{\ f(x_n)\ }{g(x_n)}\ </math> for <math>\ n = 0, 1, 2, 3, ...~~\ ;~~</math>, where the slope function <math>\ g(x)\ </math> is a composite of the original function <math>\ f\ </math> given by the ~~following~~ formula: :<math>\ g(x) = \frac{\, f\bigl( x + f(x) \bigr) \,}{f(x)} - 1\ </math> or perhaps more clearly, :<math>\ g(x) ~=~ \frac{\ f(x + h) - f(x) \ }{h}\ ~~\approx~~ \frac{\ \operatorname{d}f( x )\ }{ \operatorname{d}x } ~\equiv~ f'( x )\ ,</math> where <math>\ h = f(x)\ </math> is a step-size between the last iteration point, <math>\ x~~\ ,~~</math>, and an auxiliary point located at <math>\ x + h ~.</math>. Technically, the function <math>\ g\ </math> is called the first-order [[divided difference]] of <math>\ f\ </math> between those two points (~~ ~~it is either a ''forward''-type or ''backward''-type divided difference, depending on the sign {{nobr\|of <math>\ h\ </math>).}} Practically, it is the averaged value of the slope <math>\ f'\ </math> of the function <math>\ f\ </math> between the last sequence point <math>\ \left( x, y \right) = \bigl( x_n,\ f\left( x_n \right) \bigr)\ </math> and the auxiliary point at <math>\ \bigl( x, y \bigr) = \bigl(\ x_n + h,\ f\left( x_n + h \right)\ \bigr)~~\ ,~~</math>, with the size of the intermediate step (and its direction) given by <math>\ h = f(x_n) ~.</math>. Because the value of <math>\ g\ </math> is an approximation for <math>\ f'~~\ ,~~</math>, its value can optionally be checked to see if it meets the condition <math>\ -1 < g < 0\ </math>, which is required ~~of <math>\ f'\ ,</math>~~ to guarantee convergence of Steffensen's algorithm. Although slight non-conformance may not necessarily be dire, any large departure from the condition warns that Steffensen's method is liable to fail, and temporary use of some fallback algorithm is warranted (e.g. the more robust [[Illinois algorithm]], or plain [[regula falsi]]). It is only for the purpose of finding <math>\ h\ </math> for this auxiliary point that the value of the function <math>\ f\ </math> must be an adequate correction to get closer to its own solution, and for that reason fulfill the requirement that <math>\ -1 < f'(x_\star) < 0 ~.</math>. For all other parts of the calculation, Steffensen's method only requires the function <math>\ f\ </math> to be continuous and to actually have a nearby solution.<ref name=Dahlquist-Björck-1974/> Several modest modifications of the step <math>\ h\ </math> in the formula for the slope <math>\ g\ </math> exist, such as multiplying it by {{sfrac\| 1 \|2}} or {{sfrac\| 3 \|4}}, to accommodate functions <math>\ f\ </math> that do not quite meet the requirement. ==Advantages and drawbacks== The main advantage of Steffensen's method is that it has [[quadratic convergence]]<ref name=Dahlquist-Björck-1974/> like [[Newton's method]] – that is, both methods find roots to an equation <math>~ f ~</math> just as '"quickly'". In this case, ''quickly'' means that for both methods, the number of correct digits in the answer doubles with each step. But the formula for Newton's method requires evaluation of the function's derivative <math>~ f' ~</math> as well as the function <math>~ f ~,</math>, while Steffensen's method only requires <math>~ f ~</math> itself. This is important when the derivative is not easily or efficiently available. The price for the quick convergence is the double function evaluation: Both <math>~ f(x_n) ~</math> and <math>~ f(x_n + h) ~</math> must be calculated, which might be time-consuming if <math>~ f ~</math> is a complicated ~~function~~. For comparison, the [[secant method]] needs only one function evaluation per step. The secant method increases the number of correct digits by "only" a factor of roughly 1.6 per step, but one can do twice as many steps of the secant method within a given time. Since the secant method can carry out twice as many steps in the same time as Steffensen's method,{{efn\| Because <math>~ f( x_n + h )~</math> requires the prior calculation of <math>~ h = f(x_n) ~,</math>, the two evaluations must be done sequentially – the algorithm ''per se'' cannot be made faster by running the function evaluations in parallel. This is yet another disadvantage of Steffensen's method. }} in practical use the secant method actually converges faster than Steffensen's method, when both algorithms succeed: ~~The~~the secant method achieves a factor of about {{nowrap\|(1.6)<sup>2</sup> ≈ 2.6 times}} as many digits for every two steps (two function evaluations), compared to Steffensen's factor of 2 for every one step (two function evaluations).▼ }} ▲in practical use the secant method actually converges faster than Steffensen's method, when both algorithms succeed: The secant method achieves a factor of about {{nowrap\|(1.6)<sup>2</sup> ≈ 2.6 times}} as many digits for every two steps (two function evaluations), compared to Steffensen's factor of 2 for every one step (two function evaluations). Similar to most other [[Root-finding algorithm#Iterative methods\|iterative root-finding algorithms]], the crucial weakness in Steffensen's method is choosing an '"adequate'" starting value <math>~ x_0 ~.</math>. If the value of <math>~ x_0 ~</math> is not '"close enough'" to the actual solution <math>~ x_\star ~,</math>, the method may fail, and the sequence of values <math>~ x_0, \, x_1, \, x_2, \, x_3, \, \dots ~</math> may either erratically flip-flop between two extremes, or diverge to infinity, or both. ==Derivation using Aitken's delta-squared process== The version of Steffensen's method implemented in the [[MATLAB]] code shown below can be found using ~~the~~ [[Aitken's delta-squared process]] for ~~accelerating~~[[Series acceleration\|convergence ~~of a sequence~~acceleration]]. To compare the following formulae to the formulae in the section above, notice that <math>x_n = p \, - \, p_n ~.</math>. This method assumes starting with a linearly convergent sequence and increases the rate of convergence of that sequence. If the signs of <math>~ p_n, \, p_{n+1}, \, p_{n+2} ~</math> agree and <math>~ p_n ~</math> is '"sufficiently close'" to the desired limit of the sequence <math>~ p ~,</math>, then we can assume ~~the following:~~ :<math>\frac{\, p_{n+1} - p \,}{\, p_n - p \,} ~ \approx ~ \frac{\, p_{n+2} - p \,}{\, p_{n+1} - p \},} </math> so that ~~then~~ :<math> ( p_{n+12} - ~~p)^~~2 ~p_{n+1} ~~\approx~~+ ~p_n ) p (\,approx p_{n+2} ~~- p \,)\,(\,~~ p_n - ~~p \, ) ~~~p_{n+1}^2.</math> so ~~:<math> p_{n+1}^2 - 2 \, p_{n+1} \, p + p^2 ~ \approx ~ p_{n+2} \; p_n - (\, p_n + p_{n+2} \,) \, p + p^2 </math>~~ ~~and hence~~ ~~:<math>(\, p_{n+2} - 2 \, p_{n+1} + p_n \,) \, p ~ \approx ~ p_{n+2} \, p_n - p_{n+1}^2 ~.</math>~~ Solving for the desired limit of the sequence <math>~ p ~</math> gives: :<math> p ~ \approx ~ \frac{\, p_{n+2} \, p_n - p_{n+1}^2 \,}{\, p_{n+2} - 2 \, p_{n+1} + p_n ~~\,} ~=~ \frac{\, p_{n}^2 + p_{n} \, p_{n+2} + 2 \, p_{n} \, p_{n+1} - 2 \, p_{n} \, p_{n+1} - p_{n}^2 - p_{n+1}^2 \,}{\, p_{n+2} - 2 \, p_{n+1} + p_n \,~~}</math> :<math> =~ \frac{\, (\, p_{n}^2 + p_{n} \, p_{n+2} - 2 \, p_{n} \, p_{n+1} \,) - (\, p_{n}^2 - 2 \, p_{n} \, p_{n+1} + p_{n+1}^2 \, ) \, }{\, p_{n+2} - 2 \, p_{n+1} + p_n \,}</math> :<math> =~ p_n - \frac{\,(\, p_{n+1} - p_n )^2 \,}{\, p_{n+2} - 2 \, p_{n+1} + p_n \,} ~,</math> which results in the more rapidly convergent sequence: :<math> p ~ \approx ~ p_{n+3} ~=~ p_n - \frac{ \, ( \, p_{n+1} - p_n \, )^2 \, }{ \, p_{n+2} - 2 \, p_{n+1} + p_n \,} ~.</math> == Code example == Line 139 ⟶ 134: ==Generalization to Banach space== Steffensen's method can also be used to find an input <math>\ x = x_\star\ </math> for a different kind of function <math>\ F\ </math> that produces output the same as its input: <math>\ x_\star = F(x_\star)\ </math> for the special value <math>\ x_\star ~.</math>. Solutions like <math>\ x_\star\ </math> are called ''[[fixed point (mathematics)\|fixed point]]s''. Many of these functions can be used to find their own solutions by repeatedly recycling the result back as input, but the rate of convergence can be slow, or the function can fail to converge at all, depending on the individual function. Steffensen's method accelerates this convergence, to make it [[quadratic convergence\|quadratic]]. For orientation, the root function <math>\ f\ </math> and the fixed-point functions are simply related by <math>\ F(x) = x + \varepsilon\ f(x)~~\ ,~~</math>, where <math>\ \varepsilon\ </math> is some scalar constant small enough in magnitude to make <math>\ F\ </math> stable under iteration, but large enough for the [[non-linearity]] of the function <math>\ f\ </math> to be appreciable. All issues of a more general [[Banach space]] vs. basic [[real numbers]] being momentarily ignored for the sake of the comparison. This method for finding fixed points of a real-valued function has been ~~generalised~~generalized for functions <math>\ F : X \to X\ </math> that map a [[Banach space]] <math>\ X\ </math> onto itself or even more generally <math>\ F : X \to Y\ </math> that map from one [[Banach space]] <math>\ X\ </math> into another [[Banach space]] <math>\ Y ~.</math>. The generalized method assumes that a [[Indexed family\|family]] of [[Bounded set\|bounded]] [[linear operators]] <math>\ \{\; G(u,v): u, v \in X \;\}\ </math> associated with <math>\ u\ </math> and <math>\ v\ </math> can be devised that (locally) satisfies ~~this~~the condition:<ref name=Johnson-Scholz-1968/> :<math>\ F\left( u \right) - F\left( v \right) = G\left( u, v \right) \, \bigl[\, u - v \,\bigr]~~\ \qquad\qquad\qquad\qquad~~ </math> {{right\|{{nobr\| {{grey\|eqn. [[Coronis (textual symbol)\|{{big\|({{math\|⸎}})}}]]}} }} }}{{anchor\|⸎}} {{NumBlk\|:\|<math>~~\ G~~F\left( u~~, v~~ \right) ~~= \bigl[\~~- F\left( uv \right)- F= G\left( u, v \right)~~\ \bigr]~~ \bigl[\ u - v\ \bigr]~~^{-1}\~~ ,</math>\|{{EquationRef\|1}}}}▼ ''If division is possible'' in the [[Banach space]], the linear operator <math>\ G\ </math> can be obtained from▼ ▲:<math>\ G\left( u, v \right) = \bigl[\ F\left( u \right)- F\left( v \right)\ \bigr] \bigl[\ u - v\ \bigr]^{-1}\ ,</math> which may provide some insight: Expressed in this way, the linear operator <math>\ G\ </math> can be more easily seen to be an elaborate version of the [[divided difference]] <math>\ g\ </math> discussed in the first section, above. The quotient form is shown here for orientation only; it is ''not'' required ''per se''. Note also that division within the Banach space is not necessary for the elaborated Steffensen's method to be viable; the only requirement is that the operator <math>\ G\ </math> satisfy the [[#⸎\|equation marked]] with the [[Coronis (textual symbol)\|coronis]], {{big\|({{math\|⸎}})}}.▼ ▲''If division is possible'' in the [[Banach space]], then the linear operator <math>\ G\ </math> can be obtained from For the [[Function of a real variable\|basic real number function]] <math>\ f\ ,</math> given in the first section, the function simply takes in and puts out real numbers. There, the function <math>\ g\ </math> is a ''[[divided difference]]''. In the generalized form here, the operator <math>\ G\ </math> is the analogue of a divided difference for use in the [[Banach space]]. The operator <math>\ G\ </math> is roughly equivalent to a [[Matrix (mathematics)\|matrix]] whose entries are all functions of [[vector (mathematics)\|vector]] [[Argument of a function\|arguments]] <math>\ u\ </math> and <math>\ v ~.</math> ▼ :<math>G\left( u, v \right) = \bigl[ F\left( u \right)- F\left( v \right) \bigr] \bigl[ u - v \bigr]^{-1},</math> ▲which may provide some insight: Expressed in this way, the linear operator <math>\ G\ </math> can be more easily seen to be an elaborate version of the [[divided difference]] <math>\ g\ </math> discussed in the first section, above. The quotient form is shown here for orientation only; it is ''not'' required ''per se''. Note also that division within the Banach space is not necessary for the elaborated Steffensen's method to be viable; the only requirement is that the operator <math>\ G\ </math> satisfy ~~the [[#⸎\|equation marked]] with the [[Coronis~~ (~~textual symbol)\|coronis]],~~ {{~~big~~EquationNote\|~~({{math\|⸎~~1}})}}. ▲For the [[Function of a real variable\|basic real number function]] <math>\ f~~\ ,~~</math>, given in the first section, the function simply takes in and puts out real numbers. There, the function <math>\ g\ </math> is a ''[[divided difference]]''. In the generalized form here, the operator <math>\ G\ </math> is the analogue of a divided difference for use in the [[Banach space]]. The operator <math>\ G\ </math> is roughly equivalent to a [[Matrix (mathematics)\|matrix]] whose entries are all functions of [[vector (mathematics)\|vector]] [[Argument of a function\|arguments]] <math>\ u\ </math> and <math>\ v ~.</math> . Steffensen's method is then very similar to the Newton's method, except that it uses the divided difference <math>\ G \bigl( \, F\left( x \right), \, x \,\bigr)\ </math> instead of the derivative <math>\ F'(x) ~.</math> Note that for arguments <math>\ x\ </math> close to some fixed point <math>\ x_\star\ ,</math> fixed point functions <math>\ F\ </math> and their linear operators <math>\ G\ </math> meeting the [[#⸎\|condition marked {{big\|({{math\|⸎}})}}]], <math>\ F'(x) \approx G \bigl( \, F\left( x \right), \, x \,\bigr) \approx I\ </math> where <math>\ I\ </math> is the identity operator.▼ ▲Steffensen's method is then very similar to the Newton's method, except that it uses the divided difference <math>\ G \bigl( \, F\left( x \right)~~, \~~, x \,\bigr)\ </math> instead of the derivative <math>\ F'(x) ~.</math>. Note that for arguments <math>\ x\ </math> close to some fixed point <math>\ x_\star~~\ ,~~</math>, fixed point functions <math>\ F\ </math> and their linear operators <math>\ G\ </math> meeting ~~the [[#⸎\|~~condition ~~marked {{big\|~~({{~~math~~EquationNote\|⸎1}})~~}}]]~~, <math>\ F'(x) \approx G \bigl( \, F\left( x \right)~~, \~~, x \,\bigr) \approx I\ </math>, where <math>\ I\ </math> is the identity operator. In the case that division is possible in the Banach space, the generalized iteration formula is given by : <math> x_{n+1} = x_n + \Bigl[ I - G\bigl( F\left( x_n \right), x_n \bigr) \Bigr]^{-1}\Bigl[ F\left( x_n \right) - x_n \Bigr]\ ,</math> for <math>\ n = 1,\,2,\,3,\,..~~. ~~~.</math>. In the more general case in which division may not be possible, the iteration formula requires finding a solution <math>\ x_{n+1}\ </math> close to <math>\ x_{n}\ </math> for which : <math> \Bigl[ I - G\bigl( F\left( x_n \right), x_n \bigr) \Bigr] \Bigl[ x_{n+1} - x_n \Bigr] = F\left( x_n \right) - x_n ~.</math> Equivalently, one may seek the solution <math>\ x_{n+1}\ </math> to the somewhat reduced form : <math> \Bigl[ I - G\bigl( F\left( x_n \right), x_n \bigr) \Bigr] x_{n+1} = \Bigl[ F\left( x_n \right) - G\bigl( F\left( x_n \right), x_n \bigr) \ x_n \Bigr]\ ,</math> with all the values inside square brackets being independent of <math>\ x_{n+1}~~\ :~~</math>: ~~The~~the bracketed terms all only depend on <math>\ x_{n} ~.</math>. ~~Be aware, however~~However, that the second form may not be as numerically stable as the first: ~~Because~~because the first form involves finding a value for a (hopefully) small difference, it may be numerically more likely to avoid excessively large or erratic changes to the iterated value <math>\ x_n ~.</math>. If the linear operator <math>~ G ~</math> satisfies : <math> \Bigl\\| G \left( u, v \right) - G \left( x, y \right) \Bigr\\| \le k \biggl( \Bigl\\|u - x \Bigr\\| + \Bigr\\| v - y \Bigr\\| \biggr) ~</math> for some positive real constant <math>\ k ~,</math>, then the method converges quadratically to a fixed point of <math>\ F\ </math> if the initial approximation <math>\ x_0\ </math> is '"sufficiently close'" to the desired solution <math>\ x_\star~~\ ,~~</math> that satisfies <math>\ x_\star = F(x_\star) ~.</math>. ==Notes==

Steffensen's method: Difference between revisions