Revision as of 16:10, 19 April 2020 edit 2409:4055:11d:fbc4:c544:81a:655a:a811 (talk) No edit summary ← Previous edit		Revision as of 18:46, 6 December 2020 edit undo Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 18 (cosmetic): eval 8 templates: del empty params (10×); del \|ref=harv (1×); del \|postscript= (1×); Tag: AWB Next edit →
Line 1: The '''Frank–Wolfe algorithm''' is an [[iterative method\|iterative]] [[First-order approximation\|first-order]] [[Mathematical optimization\|optimization]] [[algorithm]] for [[constrained optimization\|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite journal \| last1 = Levitin \| first1 = E. S. \| last2 = Polyak \| first2 = B. T. \| doi = 10.1016/0041-5553(66)90114-5 \| title = Constrained minimization methods \| journal = USSR Computational Mathematics and Mathematical Physics \| volume = 6 \| issue = 5 \| pages = 1 \| year = 1966 ~~\| pmid = \| pmc =~~ }}</ref> '''reduced gradient algorithm''' and the '''convex combination algorithm''', the method was originally proposed by [[Marguerite Frank]] and [[Philip Wolfe (mathematician)\|Philip Wolfe]] in 1956.<ref>{{Cite journal \| last1 = Frank \| first1 = M. \| last2 = Wolfe \| first2 = P. \| doi = 10.1002/nav.3800030109 \| title = An algorithm for quadratic programming \| journal = Naval Research Logistics Quarterly \| volume = 3 \| issue = 1–2 \| pages = 95–110 \| year = 1956 ~~\| pmid = \| pmc =~~ }}</ref> In each iteration, the Frank–Wolfe algorithm considers a [[linear approximation]] of the objective function, and moves towards a minimizer of this linear function (taken over the same ___domain). ==Problem statement== Line 24: While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)\|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear problem over the same set in each iteration, and automatically stays in the feasible set. The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is <math>O(1/k)</math> after ''k'' iterations, so long as the gradient is [[Lipschitz continuity\|Lipschitz continuous]] with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{Cite journal \| last1 = Dunn \| first1 = J. C. \| last2 = Harshbarger \| first2 = S. \| doi = 10.1016/0022-247X(78)90137-3 \| title = Conditional gradient algorithms with open loop step size rules \| journal = Journal of Mathematical Analysis and Applications \| volume = 62 \| issue = 2 \| pages = 432 \| year = 1978 ~~\| pmid = \| pmc =~~ \| doi-access = free }}</ref> The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{Cite journal \| last1 = Clarkson \| first1 = K. L. \| title = Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm \| doi = 10.1145/1824777.1824783 \| journal = ACM Transactions on Algorithms \| volume = 6 \| issue = 4 \| pages = 1–30 \| year = 2010 ~~\| pmid = \| pmc =~~ \| citeseerx = 10.1.1.145.9299 }}</ref> as well as for example the optimization of [[flow network\|minimum–cost flow]]s in [[Transport network\|transportation network]]s.<ref>{{Cite journal \| last1 = Fukushima \| first1 = M. \| title = A modified Frank-Wolfe algorithm for solving the traffic assignment problem \| doi = 10.1016/0191-2615(84)90029-8 \| journal = Transportation Research Part B: Methodological \| volume = 18 \| issue = 2 \| pages = 169–177\| year = 1984 ~~\| pmid = \| pmc =~~ }}</ref> If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming\|linear program]]. Line 69: {{cite journal\|last=Jaggi\|first=Martin\|title=Revisiting Frank–Wolfe: Projection-Free Sparse Convex Optimization\|journal=Journal of Machine Learning Research: Workshop and Conference Proceedings \|volume=28\|issue=1\|pages=427–435\|year= 2013 \|url=http://jmlr.csail.mit.edu/proceedings/papers/v28/jaggi13.html}} (Overview paper) [http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank–Wolfe algorithm] description * {{Cite book \| last1=Nocedal \| first1=Jorge \| last2=Wright \| first2=Stephen J. \| title=Numerical Optimization \| publisher=[[Springer-Verlag]] \| ___location=Berlin, New York \| edition=2nd \| isbn=978-0-387-30303-1 \| year=2006 ~~\| ref=harv \| postscript=<!--None-->~~}}. ==External links==

Frank–Wolfe algorithm: Difference between revisions