Content deleted Content added
→Conjugate directions: remove word |
|||
(One intermediate revision by one other user not shown) | |||
Line 40:
If the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are not picked well, then progress will be slow. In particular, the gradient descent method would be slow. This can be seen in the diagram, where the green line is the result of always picking the local gradient direction. It zig-zags towards the minimum, but repeatedly overshoots. In contrast, if we pick the directions to be a set of '''mutually conjugate directions''', then there will be no overshoot, and we would obtain the global minimum after <math>n</math> steps, where <math>n</math> is the number of dimensions.
[[File:Conjugate_Diameters.svg|right|thumb|300x300px|Two conjugate diameters of an [[ellipse]]. Each edge of the bounding [[parallelogram]] is [[Parallel (geometry)|parallel]] to one of the diameters.]]
The concept of conjugate directions came from classical geometry of ellipse. For an ellipse, two semi-axes
Note that we need to scale each directional vector <math>\boldsymbol{p}_i </math> by a scalar <math>t_i </math>, so that <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> falls exactly on the ellipsoid.
Line 134:
\end{cases}</math>.
'''Proof.''' By construction, we have <math>\mathbf{r}_{i+1} = \mathbf{r}_i - \
|