Frank–Wolfe algorithm: Difference between revisions

Content deleted Content added
m Disambiguating links to Transportation network (link changed to Transport network) using DisamAssist.
Dexbot (talk | contribs)
m Bot: Deprecating Template:Cite doi and some minor fixes
Line 1:
The '''Frank–Wolfe algorithm''' is an [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite doijournal | last1 = Levitin | first1 = E. S. | last2 = Polyak | first2 = B. T. | doi = 10.1016/0041-5553(66)90114-5 |noedit title = Constrained minimization methods | journal = USSR Computational Mathematics and Mathematical Physics | volume = 6 | issue = 5 | pages = 1 | year = 1966 | pmid = | pmc = }}</ref> '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe (mathematician)|Philip Wolfe]] in&nbsp;1956.<ref>{{citeCite journal doi| last1 = Frank | first1 = M. | last2 = Wolfe | first2 = P. | doi = 10.1002/nav.3800030109 |noedit title = An algorithm for quadratic programming | journal = Naval Research Logistics Quarterly | volume = 3 | pages = 95 | year = 1956 | pmid = | pmc = }}</ref> In each iteration, the Frank–Wolfe algorithm considers a [[linear approximation]] of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same ___domain).
 
==Problem statement==
Line 24:
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set.
 
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error to the optimum is <math>O(1/k)</math> after ''k'' iterations. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{citeCite journal doi| last1 = Dunn | first1 = J. C. | last2 = Harshbarger | first2 = S. | doi = 10.1016/0022-247X(78)90137-3 |noedit title = Conditional gradient algorithms with open loop step size rules | journal = Journal of Mathematical Analysis and Applications | volume = 62 | issue = 2 | pages = 432 | year = 1978 | pmid = | pmc = }}</ref>
 
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{citeCite journal doi| last1 = Clarkson | first1 = K. L. | title = Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm | doi = 10.1145/1824777.1824783 |noedit journal = ACM Transactions on Algorithms | volume = 6 | issue = 4 | pages = 1 | year = 2010 | pmid = | pmc = }}</ref> as well as for example the optimization of [[flow network|minimum–cost flow]]s in [[Transport network|transportation network]]s.<ref>{{citeCite journal doi| last1 = Fukushima | first1 = M. | title = A modified Frank-Wolfe algorithm for solving the traffic assignment problem | doi = 10.1016/0191-2615(84)90029-8 |noedit journal = Transportation Research Part B: Methodological | volume = 18 | issue = 2 | pages = 169–153 | year = 1984 | pmid = | pmc = }}</ref>
 
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].