Revision as of 00:14, 14 December 2012 edit Martinjaggi (talk \| contribs) 6 edits →Algorithm: more concise algorithm description, and added 3d vizualization ← Previous edit		Revision as of 00:59, 14 December 2012 edit undo Martinjaggi (talk \| contribs) 6 edits more general description, and sparse greedy reference. the original paper in section 6 already proposes the algorithm for general convex functions (not only QPs) Next edit →
Line 1: InThe ~~[[mathematical~~Frank–Wolfe ~~optimization]],~~algorithm ~~the~~is ~~'''reduced~~an ~~gradient~~[[iterative method~~'''~~\|iterative]] of[[First-order ~~Frank~~approximation\|first-order]] ~~and~~[[Mathematical ~~Wolfe~~optimization\|optimization]] is[[algorithm]] anfor [[~~iterative~~constrained ~~method~~optimization\|constrained]] ~~for~~ [[~~nonlinear~~convex ~~programming~~optimization]]. Also known as the '''~~Frank–Wolfe~~conditional gradient method''', '''reduced gradient algorithm''' and the ''convex combination algorithm'', the ~~reduced gradient~~ method was originally proposed by [[Marguerite Frank]] and [[~~Phil~~Philip Wolfe]] in 1956<ref>{{cite journal \|last1=Frank \|first1=M. \|last2=Wolfe \|first2=P. \|title=An algorithm for quadratic programming \|journal=Naval Research Logistics Quarterly \|volume=3 \|number=1-2 \|year=1956 \|pages=95–110\|doi=10.1002/nav.3800030109}}</ref> ~~as an [[algorithm]] for [[quadratic programming]]~~. In [[simplex algorithm\|phase one]], the reduced gradient method finds a [[linear programming#feasible solution\|feasible solution]] to the [[linear programming#feasible region\|linear constraints]], if one exists. Thereafter, at each iteration, the ~~method~~Frank-Wolfe ~~takes~~algorithm aconsiders [[~~gradient descent\|descent~~linear ~~step~~approximation]] inof the ~~[[gradient descent\|negative gradient direction]], so reducing the [[~~objective function~~]];~~, ~~this~~and ~~[[gradient~~moves ~~descent]]~~slightly ~~step~~towards ~~is "reduced" to remain in the [[polyhedron\|polyhedral]] feasible~~a ~~region~~minimizer of ~~the~~this linear ~~constraints.~~function ~~Because~~(taken ~~quadratic programming is a generalization of linear programming,~~over the ~~reduced~~same ~~gradient method is a generalization of Dantzig's [[simplex algorithm]] for [[linear programming]]~~___domain). While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)\|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear-problem over the same set in each iteration, and automatically stays in the feasible set. The reduced gradient method is an iterative method for [[nonlinear programming]], a method that need not be restricted to quadratic programming. While the method is slower than competing methods and has been abandoned as a general purpose method of nonlinear programming, it remains widely used for specially structured problems of large scale optimization. In particular, the reduced gradient method remains popular and effective for finding approximate [[flow network\|minimum–cost flow]]s in [[transportation network]]s, which often have enormous size. The convergence of the Frank–Wolfe algorithm is sublinear in general, the error to the optimum is <math>O(1/k)</math> after k iterations. The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems<ref>{{cite doi\|10.1145/1824777.1824783}}</ref>, as well as for example the optimization of [[flow network\|minimum–cost flow]]s in [[transportation network]]s. ==Problem statement== :Minimize <math> f(\mathbf{x}) ~~= \frac{1}{2} \mathbf{x}^{\mathrm{T}} E\mathbf{x} + \mathbf{h}^{\mathrm{T}} \mathbf{x}~~ </math> :subject to <math> \mathbf{x} \~~epsilon~~in \~~mathbf~~mathcal{PD}</math>. Where the ~~n×n~~function ~~matrix E~~f is ~~[[Positive-semidefinite~~convex ~~matrix\|positive~~and ~~semidefinite]], h is an n×1 vector~~differentiable, and <math>\~~mathbf~~mathcal{PD}</math> represents a feasible region defined given by a ~~mix of linear inequality~~convex and ~~equality constraints (for example Ax ≤ b, Cx =~~bounded d)set. ==Algorithm== [[File:Frank-Wolfe-Algorithm.png\|thumbnail\|right\|A step of the Frank-Wolfe algorithm]] Initialization. Let <math>k \leftarrow 0</math> and let <math>~~x_0~~\mathbf{x}_0 \!</math> be any point in <math>\mathcal{D}</math>. '''Step 1.''' Direction-finding subproblem. The approximation of the problem that is obtained by replacing the function f with its first-order [[Taylor series\|Taylor expansion]] around <math>~~x_k~~\mathbf{x}_k \!</math> is found. Solve for <math>\mathbf{s}</math>: :Minimize <math>\mathbf{s}^T \nabla f(~~x_k~~\mathbf{x}_k)</math> :Subject to <math>\mathbf{s} \in \mathcal{D}</math> '''Step 2.''' Step size determination. Set <math>\gamma \leftarrow \frac{k}{k+2}</math>, or alternatively find <math>\gamma \!</math> that minimizes <math> f(~~x_k~~\mathbf{x}_k+\gamma(\mathbf{s}-~~x_k~~\mathbf{x}_k))</math> subject to <math>0 \le \gamma \le 1</math> . '''Step 3.''' Update. Let <math>x_\mathbf{x}_{k+1}\leftarrow ~~x_k~~\mathbf{x}_k+\gamma(s-~~x_k~~\mathbf{x}_k)</math>, let <math>k \leftarrow k+1</math> and go back to Step 2. ==Comments== If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming\|linear program]]. The algorithm generally makes good progress towards the optimum during the first few iterations, but convergence often slows down substantially when close to the minimum point. For this reason the algorithm is perhaps best used to find an approximate solution. It can be shown that the worst case convergence rate is [[sublinear]]; however, in practice the convergence rate has been observed to improve in case of many constraints.<ref>"Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.</ref> Line 28 ⟶ 31: ==References== [http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description ~~(The formulation(2) in this article should take away f(xk) to make the following discussion valid and understandable.)~~ [http://swutc.tamu.edu/publications/technicalreports/472840-00074-1.pdf Combined Traffic Signal Control and Traffic Assignment: Algorithms, Implementation and Numerical Results], Chungwon Lee and [[Randy B. Machemehl]], University of Texas at Austin, March 2005

Frank–Wolfe algorithm: Difference between revisions