Content deleted Content added
→Algorithm: cosmetic details |
cosmetic stuff, and some more explanation of the duality gap / lower bounds |
||
Line 5:
:Minimize <math> f(\mathbf{x})</math>
:subject to <math> \mathbf{x} \in \mathcal{D}</math>.
Where the function <math> f</math> is [[Convex function|convex]] and [[differentiable function|differentiable]], and the ___domain / feasible set <math>\mathcal{D}</math> is a [[Convex set|convex]] and bounded set in some [[vector space]].
==Algorithm==
Line 15:
::Minimize <math> \mathbf{s}^T \nabla f(\mathbf{x}_k)</math>
::Subject to <math>\mathbf{s} \in \mathcal{D}</math>
:''(Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series|Taylor approximation]] of <math>f</math> around <math>\mathbf{x}_k \!</math>
:'''Step 2.''' ''Step size determination:'' Set <math>\gamma \leftarrow \frac{2}{k+2}</math>, or alternatively find <math>\gamma</math> that minimizes <math> f(\mathbf{x}_k+\gamma(\mathbf{s}_k -\mathbf{x}_k))</math> subject to <math>0 \le \gamma \le 1</math> .
Line 32:
While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>{{Cite book|title=Nonlinear Programming|first= Dimitri |last=Bertsekas|year= 2003|page= 222|publisher= Athena Scientific| isbn =1-886529-00-0}}</ref>
==Lower bounds on the solution value, and primal-dual analysis==
Since <math>f</math> is convex, <math>f(\mathbf{y})</math> is always above the [[Tangent|tangent plane]] of <math>f</math> at any point <math>\mathbf{x} \in \mathcal{D}</math>:
Line 40:
</math>
This holds in particular for the (unknown) optimal solution <math>\mathbf{x}^*</math>. The best lower bound with respect to a given point <math>\mathbf{x}</math> is given by
:<math>
Line 49:
:<math>
l_k := \max (l_{k - 1}, f(\mathbf{x}_k) + (\mathbf{s}_k - \mathbf{x}
</math>
Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always <math>l_k \leq f(\mathbf{x}^*) \leq f(\mathbf{x}_k)</math>.
It has been shown that this corresponding [[duality gap]], that is the difference between <math>f(\mathbf{x}_k)</math> and the lower bound <math>l_k</math>, decreases with the same convergence rate, i.e.
<math>
f(\mathbf{x}_k) - l_k = O(1/k) .
</math>
|