Derivation of the conjugate gradient method: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 23:22, 5 August 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,317 edits →Gram–Schmidt process Tag: Visual edit ← Previous edit		Latest revision as of 02:58, 17 June 2025 edit undo Shlomi Hillel (talk \| contribs) 275 edits →Conjugate directions: remove word Tag: 2017 wikitext editor
(One intermediate revision by one other user not shown)
Line 40: If the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are not picked well, then progress will be slow. In particular, the gradient descent method would be slow. This can be seen in the diagram, where the green line is the result of always picking the local gradient direction. It zig-zags towards the minimum, but repeatedly overshoots. In contrast, if we pick the directions to be a set of '''mutually conjugate directions''', then there will be no overshoot, and we would obtain the global minimum after <math>n</math> steps, where <math>n</math> is the number of dimensions. [[File:Conjugate_Diameters.svg\|right\|thumb\|300x300px\|Two conjugate diameters of an [[ellipse]]. Each edge of the bounding [[parallelogram]] is [[Parallel (geometry)\|parallel]] to one of the diameters.]] The concept of conjugate directions came from classical geometry of ellipse. For an ellipse, two semi-axes ~~center~~ are mutually conjugate with respect to the ellipse iff the lines are parallel to the tangent bounding parallelogram, as pictured. The concept generalizes to ''n''-dimensional ellipsoids, where ''n'' semi-axes <math>t_0\boldsymbol{p}_0, \dots, t_{n-1}\boldsymbol{p}_{n-1}</math> are mutually conjugate with respect to the ellipsoid iff each axis is parallel to the tangent bounding [[parallelepiped]]. In other words, for any <math>i</math>, the tangent plane to the ellipsoid at <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> is a hyperplane spanned by the vectors <math>\{\boldsymbol{p}_j : j \neq i\}</math>, where <math>\boldsymbol{c} </math> is the center of the ellipsoid. Note that we need to scale each directional vector <math>\boldsymbol{p}_i </math> by a scalar <math>t_i </math>, so that <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> falls exactly on the ellipsoid. Line 134: \end{cases}</math>. '''Proof.''' By construction, we have <math>\mathbf{r}_{i+1} = \mathbf{r}_i - \~~alpha_k~~alpha_i \mathbf{A p}_i</math>, thus<math display="block">\boldsymbol r_{k+1}^T \boldsymbol A \boldsymbol p_i = \boldsymbol r_{k+1}^T \frac{\boldsymbol r_{i} -\boldsymbol r_{i+1}}{\alpha_i}</math>Now apply lemma 1.