Content deleted Content added
geometric derivation |
→Conjugate directions: gram-schmid |
||
Line 38:
If the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math> are not picked well, then progress will be slow. In particular, the gradient descent method would be slow. This can be seen in the diagram, where the green line is the result of always picking the local gradient direction. It zig-zags towards the minimum, but repeatedly overshoots. In contrast, if we pick the directions to be a set of '''mutually conjugate directions''', then there will be no overshoot, and we would obtain the global minimum after <math>n</math> steps, where <math>n</math> is the number of dimensions.
[[File:Conjugate_Diameters.svg|right|thumb|300x300px|Two conjugate diameters of an [[ellipse]]. Each edge of the bounding [[parallelogram]] is [[Parallel (geometry)|parallel]] to one of the diameters.]]
The concept of conjugate directions came from classical geometry of ellipse. For an ellipse, two semi-axes center are mutually conjugate with respect to the ellipse iff the lines are parallel to the tangent bounding parallelogram, as pictured. The concept generalizes to ''n''-dimensional ellipsoids, where ''n'' semi-axes <math>t_0\boldsymbol{p}_0, \dots, t_{n-1}\boldsymbol{p}_{n-1}</math> are mutually conjugate with respect to the ellipsoid iff each axis is parallel to the tangent bounding [[parallelepiped]]. In other words, for any <math>i</math>, the tangent plane to the ellipsoid at <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> is a hyperplane spanned by the vectors <math>\{\boldsymbol{p}_j : j \neq i\}</math>, where <math>\boldsymbol{c} </math> is the center of the ellipsoid.
Note that we need to scale each directional vector <math>\boldsymbol{p}_i </math> by a scalar <math>t_i </math>, so that <math>\boldsymbol{c} + t_i \boldsymbol{p}_i</math> falls exactly on the ellipsoid.
Given an ellipsoid with equation <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x} = C </math>for some constant <math>C </math>, we can translate it so that its center is at origin. This changes the equation to <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x} = C' </math>for some other constant <math>C' </math>. The condition of tangency is then:<math display="block">(\boldsymbol{p}_i + \boldsymbol{p}_j dt)^\mathrm{T}\boldsymbol{A} (\boldsymbol{p}_i + \boldsymbol{p}_j dt) = C' + O(dt^2), \quad \forall i \neq j </math>that is,▼
▲Given an ellipsoid with equation <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x}-2\boldsymbol{b}^\mathrm{T}\boldsymbol{x} = C </math>for some constant <math>C </math>, we can translate it so that its center is at origin. This changes the equation to <math display="block">\boldsymbol{x}^\mathrm{T}\boldsymbol{A}\boldsymbol{x} = C' </math>for some other constant <math>C' </math>. The condition of tangency is then:<math display="block">(t_i\boldsymbol{p}_i + \boldsymbol{p}_j
:<math>\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_j=0</math>▼
for any <math>i\neq j</math>.▼
The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions <math>\boldsymbol{p}_0,\boldsymbol{p}_1,\boldsymbol{p}_2,\ldots</math>. Specific choices lead to various methods including the conjugate gradient method and [[Gaussian elimination]].
=== Gram–Schmidt process ===
We can tabulate the equations that we need to set to zero:
{| class="wikitable"
|+
!
!
|0
|1
|2
|3
|<math>\cdots</math>
|-
| 0
|
|
| <math>\boldsymbol{p}_0^\mathrm{T}\boldsymbol{Ap}_2</math>
| <math>\boldsymbol{p}_0^\mathrm{T}\boldsymbol{Ap}_3</math>
|-
| 1
|
|
|
| <math>\boldsymbol{p}_1^\mathrm{T}\boldsymbol{Ap}_2</math>
| <math>\boldsymbol{p}_1^\mathrm{T}\boldsymbol{Ap}_3</math>
| <math>\cdots</math>
|-
| 2
|
|
|
|
| <math>\boldsymbol{p}_2^\mathrm{T}\boldsymbol{Ap}_3</math>
| <math>\cdots</math>
|-
| <math>\vdots</math>
|
|
|
|
|
| <math>\ddots</math>
|}
This resembles the problem of orthogonalization, which requires <math>\boldsymbol{p}_i^\mathrm{T}\boldsymbol{p}_j=0</math> for any <math>i\neq j</math>, and <math>\boldsymbol{p}_i^\mathrm{T}\boldsymbol{p}_j=1</math> for any <math>i = j</math>. Thus the problem of finding conjugate axes is less constrained than the problem of orthogonalization, so the [[Gram–Schmidt process]] works, with additional degrees of freedom that we can use to pick the ones that would simplify the computation:
* Arbitrarily set <math>\boldsymbol{p}_0</math>.
* Arbitrarily set <math>\boldsymbol{p}_{10}</math>, then modify it to <math>\boldsymbol{p}_{1} = \boldsymbol{p}_{10} -
\frac{\boldsymbol{p}_0^\mathrm{T}\boldsymbol{Ap}_{10}}{\boldsymbol{p}_0^\mathrm{T}\boldsymbol{Ap}_0}\boldsymbol{p}_{0}</math>.
* Arbitrarily set <math>\boldsymbol{p}_{20}</math>, then modify it to <math>\boldsymbol{p}_{2} = \boldsymbol{p}_{20} - \sum_{i = 0}^1
\frac{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_{20}}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\boldsymbol{p}_{i}</math>.
* ...
* Arbitrarily set <math>\boldsymbol{p}_{n-1,0}</math>, then modify it to <math>\boldsymbol{p}_{n-1} = \boldsymbol{p}_{n-1,0} - \sum_{i = 0}^{n-2}
\frac{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_{n-1,0}}{\boldsymbol{p}_i^\mathrm{T}\boldsymbol{Ap}_i}\boldsymbol{p}_{i} </math>.
==Arnoldi/Lanczos iteration==
|