Revision as of 23:04, 2 February 2013 edit 193.52.24.2 (talk) link to convex set (minor) ← Previous edit		Revision as of 20:39, 6 March 2013 edit undo AManWithNoPlan (talk \| contribs) Extended confirmed users 100,078 edits \| Assisted by Citation bot r419 Next edit →
Line 1: The '''Frank–Wolfe algorithm''' is a simple [[iterative method\|iterative]] [[First-order approximation\|first-order]] [[Mathematical optimization\|optimization]] [[algorithm]] for [[constrained optimization\|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite doi\|10.1016/0041-5553(66)90114-5\|noedit}}</ref> '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe]] in 1956.<ref>{{cite ~~journal~~ doi\|~~last1=Frank \|first1=M. \|last2=Wolfe \|first2=P. \|title=An algorithm for quadratic programming \|journal=Naval Research Logistics Quarterly \|volume=3 \|number=1-2 \|year=1956 \|pages=95–110\|doi=~~10.1002/nav.3800030109\|noedit}}</ref> In each iteration, the Frank–Wolfe algorithm considers a [[linear approximation]] of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same ___domain). ==Problem statement== Line 24: While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)\|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set. The convergence of the Frank–Wolfe algorithm is sublinear in general: the error to the optimum is <math>O(1/k)</math> after k iterations. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{cite doi\|10.1016/0022-247X(78)90137-3\|noedit}}</ref> The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{cite doi\|10.1145/1824777.1824783\|noedit}}</ref> as well as for example the optimization of [[flow network\|minimum–cost flow]]s in [[transportation network]]s.<ref>{{cite doi\|10.1016/0191-2615(84)90029-8\|noedit}}</ref> If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming\|linear program]]. While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>"{{Cite book\|title=Nonlinear Programming",\|first= Dimitri \|last=Bertsekas,\|year= 2003, \|page= 222.\|publisher= Athena Scientific,\| ~~ISBN~~isbn =1-886529-00-0.}}</ref> ==Notes== {{Reflist}} ~~<references/>~~ ==Bibliography== {{cite journal\|last=Jaggi\|first=Martin\|title=Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization\|journal=~~ICML~~Journal of Machine Learning Research: Workshop and Conference Proceedings \|volume=28\|issue=1\|pages=427–435\|year= 2013 \|url=http://~~m8j~~jmlr.~~net~~csail.mit.edu/proceedings/papers/~~math~~v28/~~revisited-FW~~jaggi13.~~pdf~~html}} (Overview paper) [http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank-Wolfe algorithm] description

Frank–Wolfe algorithm: Difference between revisions