Content deleted Content added
Paradoctor (talk | contribs) MOS:LINKSPECIFITY, if the supertopic is desired instead, please adapt link text accordingly |
m →Choosing the step size and descent direction: Overshoot |
||
Line 40:
=== Choosing the step size and descent direction ===
Since using a step size <math>\gamma</math> that is too small would slow convergence, and a <math>\gamma</math> too large would lead to overshoot and divergence, finding a good setting of <math>\gamma</math> is an important practical problem. [[Philip Wolfe (mathematician)|Philip Wolfe]] also advocated using "clever choices of the [descent] direction" in practice.<ref>{{cite journal |last1=Wolfe |first1=Philip |title=Convergence Conditions for Ascent Methods |journal=SIAM Review |date=April 1969 |volume=11 |issue=2 |pages=226–235 |doi=10.1137/1011036 }}</ref> Whilst using a direction that deviates from the steepest descent direction may seem counter-intuitive, the idea is that the smaller slope may be compensated for by being sustained over a much longer distance.
To reason about this mathematically, consider a direction <math> \mathbf{p}_n</math> and step size <math> \gamma_n</math> and consider the more general update:
|