Derivation of the conjugate gradient method: Difference between revisions

Content deleted Content added
 
(One intermediate revision by one other user not shown)
Line 40:
If the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are not picked well, then progress will be slow. In particular, the gradient descent method would be slow. This can be seen in the diagram, where the green line is the result of always picking the local gradient direction. It zig-zags towards the minimum, but repeatedly overshoots. In contrast, if we pick the directions to be a set of '''mutually conjugate directions''', then there will be no overshoot, and we would obtain the global minimum after <math>n</math> steps, where <math>n</math> is the number of dimensions.
[[File:Conjugate_Diameters.svg|right|thumb|300x300px|Two conjugate diameters of an [[ellipse]]. Each edge of the bounding [[parallelogram]] is [[Parallel (geometry)|parallel]] to one of the diameters.]]
The concept of conjugate directions came from classical geometry of ellipse. For an ellipse, two semi-axes center are mutually conjugate with respect to the ellipse iff the lines are parallel to the tangent bounding parallelogram, as pictured. The concept generalizes to ''n''-dimensional ellipsoids, where ''n'' semi-axes <math>t_0\boldsymbol{p}_0, \dots, t_{n-1}\boldsymbol{p}_{n-1}</math> are mutually conjugate with respect to the ellipsoid iff each axis is parallel to the tangent bounding [[parallelepiped]]. In other words, for any <math>i</math>, the tangent plane to the ellipsoid at <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> is a hyperplane spanned by the vectors <math>\{\boldsymbol{p}_j : j \neq i\}</math>, where <math>\boldsymbol{c} </math> is the center of the ellipsoid.
 
Note that we need to scale each directional vector <math>\boldsymbol{p}_i </math> by a scalar <math>t_i </math>, so that <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> falls exactly on the ellipsoid.
Line 134:
\end{cases}</math>.
 
'''Proof.''' By construction, we have <math>\mathbf{r}_{i+1} = \mathbf{r}_i - \alpha_kalpha_i \mathbf{A p}_i</math>, thus<math display="block">\boldsymbol r_{k+1}^T \boldsymbol A \boldsymbol p_i = \boldsymbol r_{k+1}^T \frac{\boldsymbol r_{i} -\boldsymbol r_{i+1}}{\alpha_i}</math>Now apply lemma 1.