Content deleted Content added
Martinjaggi (talk | contribs) m minor fix in the algorithm, better linking |
Martinjaggi (talk | contribs) clean up intro and comments, added more references to levitin&polyak 1966, dunn 1978, fukushima 1984, jaggi 2012 |
||
Line 1:
The Frank–Wolfe algorithm is an [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method'''<ref>{{Cite doi|10.1016/0041-5553(66)90114-5}}</ref>, '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe]] in 1956<ref>{{cite journal |last1=Frank |first1=M. |last2=Wolfe |first2=P. |title=An algorithm for quadratic programming |journal=Naval Research Logistics Quarterly |volume=3 |number=1-2 |year=1956 |pages=95–110|doi=10.1002/nav.3800030109}}</ref>. In each iteration, the Frank-Wolfe algorithm considers [[linear approximation]] of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same ___domain).
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear-problem over the same set in each iteration, and automatically stays in the feasible set.▼
The convergence of the Frank–Wolfe algorithm is sublinear in general, the error to the optimum is <math>O(1/k)</math> after k iterations.▼
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems<ref>{{cite doi|10.1145/1824777.1824783}}</ref>, as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[transportation network]]s.▼
==Problem statement==
Line 9 ⟶ 5:
:Minimize <math> f(\mathbf{x})</math>
:subject to <math> \mathbf{x} \in \mathcal{D}</math>.
Where the function f is [[Convex function|convex]] and [[differentiable function|differentiable]], and the ___domain
==Algorithm==
[[File:Frank-Wolfe-Algorithm.png|thumbnail|right|A step of the Frank-Wolfe algorithm]]
:Initialization. Let <math>k \leftarrow 0</math> and let <math>\mathbf{x}_0 \!</math> be any point in <math>\mathcal{D}</math>.
:'''Step 1.''' Direction-finding subproblem. Find <math>\mathbf{s}</math> solving
::Minimize <math>\mathbf{s}^T \nabla f(\mathbf{x}_k)</math>
::Subject to <math>\mathbf{s} \in \mathcal{D}</math>
:Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series|Taylor approximation]] of f around <math>\mathbf{x}_k \!</math>.
:'''Step 2.''' Step size determination. Set <math>\gamma \leftarrow \frac{k}{k+2}</math>, or alternatively find <math>\gamma \!</math> that minimizes <math> f(\mathbf{x}_k+\gamma(\mathbf{s}-\mathbf{x}_k))</math> subject to <math>0 \le \gamma \le 1</math> .
:'''Step 3.''' Update. Let <math>\mathbf{x}_{k+1}\leftarrow \mathbf{x}_k+\gamma(\mathbf{s}-\mathbf{x}_k)</math>, let <math>k \leftarrow k+1</math> and go back to Step 1.
==
▲While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]]. ▼
▲The convergence of the Frank–Wolfe algorithm is sublinear in general
▲The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems<ref>{{cite doi|10.1145/1824777.1824783}}</ref>, as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[transportation network]]s<ref>{{cite doi|10.1016/0191-2615(84)90029-8}}</ref>.
*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description▼
▲If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
== Notes ==▼
While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems<ref>"Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.</ref>.
<references/>
==Bibliography==
*{{cite journal|last=Jaggi|first=Martin|title=Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization|journal=ICML 2013|url=http://m8j.net/math/revisited-FW.pdf}} (Overview paper)
▲*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description
{{Optimization algorithms|convex}}
Line 41 ⟶ 44:
[[Category:Optimization algorithms and methods]]
[[Category:Iterative methods]]
[[Category:First order methods]]
[[Category:Gradient methods]]
[[nl:Frank-Wolfe-algoritme]]
|