Content deleted Content added
Citation bot (talk | contribs) Add: doi, page. Removed parameters. | Use this bot. Report bugs. | Suggested by Abductive | Category:Articles containing proofs | #UCB_Category 277/409 |
geometric derivation |
||
Line 3:
:<math>\boldsymbol{Ax}=\boldsymbol{b}</math>
where <math>\boldsymbol{A}</math> is [[Symmetric matrix|symmetric]] [[Positive-definite matrix|positive-definite]], without computing <math>\boldsymbol{A}^{-1}</math> explicitly. The conjugate gradient method can be derived from several different perspectives, including specialization of the [[conjugate direction method]]<ref>Conjugate Direction Methods http://user.it.uu.se/~matsh/opt/f8/node5.html</ref> for [[Optimization (mathematics)|optimization]], and variation of the [[Arnoldi iteration|Arnoldi]]/[[Lanczos iteration|Lanczos]] iteration for [[eigenvalue]] problems.
The intent of this article is to document the important steps in these derivations.
==
The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function
:<math>f(\boldsymbol{x})=\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x}\text{.}</math>
which allows us to apply geometric intuition.
===
[[File:Conjugate gradient illustration.svg|thumb]]
Geometrically, the quadratic function can be equivalently presented by writing down its value at every point in space. The points of equal value make up its contour surfaces, which are concentric [[Ellipsoid|ellipsoids]] with the equation <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x} = C </math>for varying <math>C</math>. As <math>C</math> decreases, the ellipsoids become smaller and smaller, until at its minimal value, the ellipsoid shrinks to their shared center.
Minimizing the quadratic function is then a problem of moving around the plane, searching for that shared center of all those ellipsoids. The center can be found by computing <math>\boldsymbol{A}^{-1}</math> explicitly, but this is precisely what we are trying to avoid.
The simplest method is greedy [[line search]], where we start at some point <math>\boldsymbol{x}_0</math>, pick a direction <math>\boldsymbol{p}_0</math> somehow, then minimize <math>f(\boldsymbol{x}_0 + \boldsymbol{p}_0 \alpha_0)</math>. This has a simple closed-form solution that does not involve matrix inversion:<math display="block">\alpha_0 = \frac{\boldsymbol{p}_0^\mathrm{T}(\boldsymbol{b}-\boldsymbol{Ax}_0) }{\boldsymbol{p}_0^\mathrm{T}\boldsymbol{A}\boldsymbol{p}_0} </math>Geometrically, we start at some point <math>\boldsymbol{x}_0</math> on some ellipsoid, then choose a direction and travel along that direction, until we hit the point where the ellipsoid is minimized in that direction. This is not necessarily the minimum, but it is progress towards it. Visually, it is moving along a line, and stopping as soon as we reach a point tangent to the contour ellipsoid.
one starts with an initial guess <math>\boldsymbol{x}_0</math> and the corresponding residual <math>\boldsymbol{r}_0=\boldsymbol{b}-\boldsymbol{Ax}_0</math>, and computes the iterate and residual by the formulae▼
We can now repeat this procedure, starting at our new point <math>\boldsymbol{x}_1 = \boldsymbol{x}_0 + \alpha_0\boldsymbol{p}_0 </math>, pick a new direction <math>\boldsymbol{p}_1</math>, and repeat.
We can summarize this as the following algorithm:
▲
:<math>\begin{align}
Line 26 ⟶ 33:
\end{align}</math>
where <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are
=== If the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are not picked well, then progress will be slow. In particular, the gradient descent method would be slow. This can be seen in the diagram, where the green line is the result of always picking the local gradient direction. It zig-zags towards the minimum, but repeatedly overshoots. In contrast, if we pick the directions to be a set of '''mutually conjugate directions''', then there will be no overshoot, and we would obtain the global minimum after <math>n</math> steps, where <math>n</math> is the number of dimensions.
[[File:Conjugate_Diameters.svg|right|thumb|300x300px|Two conjugate diameters of an [[ellipse]]. Each edge of the bounding [[parallelogram]] is [[Parallel (geometry)|parallel]] to one of the diameters.]]
The concept of conjugate directions came from classical geometry of ellipse. For an ellipse, two semi-axes center are mutually conjugate with respect to the ellipse iff the lines are parallel to the tangent bounding parallelogram, as pictured. The concept generalizes to ''n''-dimensional ellipsoids, where ''n'' semi-axes <math>\boldsymbol{p}_0, \dots, \boldsymbol{p}_{n-1}</math> are mutually conjugate with respect to the ellipsoid iff each axis is parallel to the tangent bounding [[parallelepiped]]. In other words, for any <math>i</math>, the tangent plane to the ellipsoid at <math>\boldsymbol{c} + \boldsymbol{p}_i</math> is a hyperplane spanned by the vectors <math>\{\boldsymbol{p}_j : j \neq i\}</math>, where <math>\boldsymbol{c} </math> is the center of the ellipsoid.
Given an ellipsoid with equation <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x} = C </math>for some constant <math>C </math>, we can translate it so that its center is at origin. This changes the equation to <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x} = C' </math>for some other constant <math>C' </math>. The condition of tangency is then:<math display="block">(\boldsymbol{p}_i + \boldsymbol{p}_j dt)^\mathrm{T}\boldsymbol{A} (\boldsymbol{p}_i + \boldsymbol{p}_j dt) = C' + O(dt^2), \quad \forall i \neq j </math>that is,
:<math>\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_j=0</math>
Line 34 ⟶ 48:
The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math>. Specific choices lead to various methods including the conjugate gradient method and [[Gaussian elimination]].
==
{{see|Arnoldi iteration|Lanczos iteration}}
The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.
|