Frank–Wolfe algorithm: Difference between revisions

Content deleted Content added
link to convex set (minor)
Line 1:
The '''Frank–Wolfe algorithm''' is a simple [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite doi|10.1016/0041-5553(66)90114-5|noedit}}</ref> '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe]] in&nbsp;1956.<ref>{{cite journal doi|last1=Frank |first1=M. |last2=Wolfe |first2=P. |title=An algorithm for quadratic programming |journal=Naval Research Logistics Quarterly |volume=3 |number=1-2 |year=1956 |pages=95–110|doi=10.1002/nav.3800030109|noedit}}</ref> In each iteration, the Frank–Wolfe algorithm considers a [[linear approximation]] of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same ___domain).
 
==Problem statement==
Line 24:
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set.
 
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error to the optimum is <math>O(1/k)</math> after k iterations. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{cite doi|10.1016/0022-247X(78)90137-3|noedit}}</ref>
 
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{cite doi|10.1145/1824777.1824783|noedit}}</ref> as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[transportation network]]s.<ref>{{cite doi|10.1016/0191-2615(84)90029-8|noedit}}</ref>
 
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
 
While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>"{{Cite book|title=Nonlinear Programming",|first= Dimitri |last=Bertsekas,|year= 2003, |page= 222.|publisher= Athena Scientific,| ISBNisbn =1-886529-00-0.}}</ref>
 
==Notes==
{{Reflist}}
<references/>
 
==Bibliography==
*{{cite journal|last=Jaggi|first=Martin|title=Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization|journal=ICMLJournal of Machine Learning Research: Workshop and Conference Proceedings |volume=28|issue=1|pages=427–435|year= 2013 |url=http://m8jjmlr.netcsail.mit.edu/proceedings/papers/mathv28/revisited-FWjaggi13.pdfhtml}} (Overview paper)
*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description