Frank–Wolfe algorithm: Difference between revisions

Content deleted Content added
m minor fix in the algorithm, better linking
clean up intro and comments, added more references to levitin&polyak 1966, dunn 1978, fukushima 1984, jaggi 2012
Line 1:
The Frank–Wolfe algorithm is an [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method'''<ref>{{Cite doi|10.1016/0041-5553(66)90114-5}}</ref>, '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe]] in&nbsp;1956<ref>{{cite journal |last1=Frank |first1=M. |last2=Wolfe |first2=P. |title=An algorithm for quadratic programming |journal=Naval Research Logistics Quarterly |volume=3 |number=1-2 |year=1956 |pages=95–110|doi=10.1002/nav.3800030109}}</ref>. In each iteration, the Frank-Wolfe algorithm considers [[linear approximation]] of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same ___domain).
 
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear-problem over the same set in each iteration, and automatically stays in the feasible set.
The convergence of the Frank–Wolfe algorithm is sublinear in general, the error to the optimum is <math>O(1/k)</math> after k iterations.
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems<ref>{{cite doi|10.1145/1824777.1824783}}</ref>, as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[transportation network]]s.
 
==Problem statement==
Line 9 ⟶ 5:
:Minimize <math> f(\mathbf{x})</math>
:subject to <math> \mathbf{x} \in \mathcal{D}</math>.
Where the function f is [[Convex function|convex]] and [[differentiable function|differentiable]], and the ___domain or/ feasible set <math>\mathcal{D}</math> is [[convex]] and bounded.
 
==Algorithm==
[[File:Frank-Wolfe-Algorithm.png|thumbnail|right|A step of the Frank-Wolfe algorithm]]
 
:Initialization. Let <math>k \leftarrow 0</math> and let <math>\mathbf{x}_0 \!</math> be any point in <math>\mathcal{D}</math>.
 
:'''Step 1.''' Direction-finding subproblem. Find <math>\mathbf{s}</math> solving
::Minimize <math>\mathbf{s}^T \nabla f(\mathbf{x}_k)</math>
::Subject to <math>\mathbf{s} \in \mathcal{D}</math>
:Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series|Taylor approximation]] of f around <math>\mathbf{x}_k \!</math>.
 
:'''Step 2.''' Step size determination. Set <math>\gamma \leftarrow \frac{k}{k+2}</math>, or alternatively find <math>\gamma \!</math> that minimizes <math> f(\mathbf{x}_k+\gamma(\mathbf{s}-\mathbf{x}_k))</math> subject to <math>0 \le \gamma \le 1</math> .
 
:'''Step 3.''' Update. Let <math>\mathbf{x}_{k+1}\leftarrow \mathbf{x}_k+\gamma(\mathbf{s}-\mathbf{x}_k)</math>, let <math>k \leftarrow k+1</math> and go back to Step 1.
 
==CommentsProperties==
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear- problem over the same set in each iteration, and automatically stays in the feasible set.
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
 
The convergence of the Frank–Wolfe algorithm is sublinear in general,: the error to the optimum is <math>O(1/k)</math> after k iterations. The same convergence rate can also be shown if the sub-problems are only solved approximately<ref>{{cite doi|10.1016/0022-247X(78)90137-3}}</ref>.
The algorithm generally makes good progress towards the optimum during the first few iterations, but convergence often slows down substantially when close to the minimum point. For this reason the algorithm is perhaps best used to find an approximate solution. It can be shown that the worst case convergence rate is [[sublinear]]; however, in practice the convergence rate has been observed to improve in case of many constraints.<ref>"Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.</ref>
 
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems<ref>{{cite doi|10.1145/1824777.1824783}}</ref>, as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[transportation network]]s<ref>{{cite doi|10.1016/0191-2615(84)90029-8}}</ref>.
==References==
*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description
*[http://swutc.tamu.edu/publications/technicalreports/472840-00074-1.pdf Combined Traffic Signal Control and Traffic Assignment: Algorithms, Implementation and Numerical Results], Chungwon Lee and [[Randy B. Machemehl]], University of Texas at Austin, March 2005
 
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
== Notes ==
 
While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems<ref>"Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.</ref>.
 
== Notes ==
<references/>
 
==Bibliography==
*{{cite journal|last=Jaggi|first=Martin|title=Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization|journal=ICML 2013|url=http://m8j.net/math/revisited-FW.pdf}} (Overview paper)
*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description
 
 
{{Optimization algorithms|convex}}
Line 41 ⟶ 44:
[[Category:Optimization algorithms and methods]]
[[Category:Iterative methods]]
[[Category:First order methods]]
[[Category:Gradient methods]]
 
[[nl:Frank-Wolfe-algoritme]]