Revision as of 22:42, 13 November 2013 edit 128.32.47.67 (talk) →Algorithm: cosmetic details ← Previous edit		Revision as of 23:03, 13 November 2013 edit undo 128.32.47.67 (talk) cosmetic stuff, and some more explanation of the duality gap / lower bounds Next edit →
Line 5: :Minimize <math> f(\mathbf{x})</math> :subject to <math> \mathbf{x} \in \mathcal{D}</math>. Where the function <math> f</math> is [[Convex function\|convex]] and [[differentiable function\|differentiable]], and the ___domain / feasible set <math>\mathcal{D}</math> is a [[Convex set\|convex]] and bounded set in some [[vector space]]. ==Algorithm== Line 15: ::Minimize <math> \mathbf{s}^T \nabla f(\mathbf{x}_k)</math> ::Subject to <math>\mathbf{s} \in \mathcal{D}</math> :''(Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series\|Taylor approximation]] of <math>f</math> around <math>\mathbf{x}_k \!</math>).)'' :'''Step 2.''' ''Step size determination:'' Set <math>\gamma \leftarrow \frac{2}{k+2}</math>, or alternatively find <math>\gamma</math> that minimizes <math> f(\mathbf{x}_k+\gamma(\mathbf{s}_k -\mathbf{x}_k))</math> subject to <math>0 \le \gamma \le 1</math> . Line 32: While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>{{Cite book\|title=Nonlinear Programming\|first= Dimitri \|last=Bertsekas\|year= 2003\|page= 222\|publisher= Athena Scientific\| isbn =1-886529-00-0}}</ref> ==Lower bounds on the solution value, and primal-dual analysis== Since <math>f</math> is convex, <math>f(\mathbf{y})</math> is always above the [[Tangent\|tangent plane]] of <math>f</math> at any point <math>\mathbf{x} \in \mathcal{D}</math>: Line 40: </math> This holds in particular for the (unknown) optimal solution <math>\mathbf{x}^</math>. The best lower bound with respect to a given point <math>\mathbf{x}</math> is given by :<math> Line 49: :<math> l_k := \max (l_{k - 1}, f(\mathbf{x}_k) + (\mathbf{s}_k - \mathbf{x}_k)^T \nabla f(\mathbf{x}_k)) </math> Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always <math>l_k \leq f(\mathbf{x}^) \leq f(\mathbf{x}_k)</math>. It has been shown that this corresponding [[duality gap]], that is the difference between <math>f(\mathbf{x}_k)</math> and the lower bound <math>l_k</math>, decreases with the same convergence rate, i.e. <math> f(\mathbf{x}_k) - l_k = O(1/k) . </math>

Frank–Wolfe algorithm: Difference between revisions