Content deleted Content added
Citation bot (talk | contribs) m Alter: pages, isbn. Add: citeseerx, issue. Formatted dashes. | You can use this bot yourself. Report bugs here. | User-activated. |
|||
Line 1:
The '''Frank–Wolfe algorithm''' is an [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite journal | last1 = Levitin | first1 = E. S. | last2 = Polyak | first2 = B. T. | doi = 10.1016/0041-5553(66)90114-5 | title = Constrained minimization methods | journal = USSR Computational Mathematics and Mathematical Physics | volume = 6 | issue = 5 | pages = 1 | year = 1966 | pmid = | pmc = }}</ref> '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe (mathematician)|Philip Wolfe]] in 1956.<ref>{{Cite journal | last1 = Frank | first1 = M. | last2 = Wolfe | first2 = P. | doi = 10.1002/nav.3800030109 | title = An algorithm for quadratic programming | journal = Naval Research Logistics Quarterly | volume = 3 | issue = 1–2 | pages =
==Problem statement==
Line 26:
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is <math>O(1/k)</math> after ''k'' iterations, so long as the gradient is [[Lipschitz continuity|Lipschitz continuous]] with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{Cite journal | last1 = Dunn | first1 = J. C. | last2 = Harshbarger | first2 = S. | doi = 10.1016/0022-247X(78)90137-3 | title = Conditional gradient algorithms with open loop step size rules | journal = Journal of Mathematical Analysis and Applications | volume = 62 | issue = 2 | pages = 432 | year = 1978 | pmid = | pmc = }}</ref>
The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{Cite journal | last1 = Clarkson | first1 = K. L. | title = Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm | doi = 10.1145/1824777.1824783 | journal = ACM Transactions on Algorithms | volume = 6 | issue = 4 | pages =
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>{{Cite book|title=Nonlinear Programming|first= Dimitri |last=Bertsekas|year= 1999|page= 215|publisher= Athena Scientific| isbn =978-1-886529-00-
==Lower bounds on the solution value, and primal-dual analysis==
|