Content deleted Content added
No edit summary |
The direction-finding subproblem and the update rule did not comply with each other. Either x_k +s in D in the subproblem and x_k+1 <-- x_k + \alpha s in the update or s in D in the subproblem and x_k+1 <-- x_k + \alpha (s - x_k) in the update are used. See Jaggi (2013) |
||
(11 intermediate revisions by 11 users not shown) | |||
Line 1:
{{Short description|Optimization algorithm}}
The '''Frank–Wolfe algorithm''' is an [[iterative method|iterative]] [[First-order approximation|first-order]] [[Mathematical optimization|optimization]] [[algorithm]] for [[constrained optimization|constrained]] [[convex optimization]]. Also known as the '''conditional gradient method''',<ref>{{Cite journal | last1 = Levitin | first1 = E. S. | last2 = Polyak | first2 = B. T. | doi = 10.1016/0041-5553(66)90114-5 | title = Constrained minimization methods | journal = USSR Computational Mathematics and Mathematical Physics | volume = 6 | issue = 5 | pages = 1 | year = 1966
==Problem statement==
Line 14 ⟶ 15:
:'''Step 1.''' ''Direction-finding subproblem:'' Find <math>\mathbf{s}_k</math> solving
::Minimize <math> \mathbf{s}^T \nabla f(\mathbf{x}_k)</math>
::Subject to <math> \mathbf{s} \in \mathcal{D}</math>
:''(Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series|Taylor approximation]] of <math>f</math> around <math>\mathbf{x}_k \!</math> constrained to stay within <math>\mathcal{D}</math>.)''
:'''Step 2.''' ''Step size determination:'' Set <math>\
:'''Step 3.''' ''Update:'' Let <math>\mathbf{x}_{k+1}\leftarrow \mathbf{x}_k+\
==Properties==
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is <math>O(1/k)</math> after ''k'' iterations, so long as the gradient is [[Lipschitz continuity|Lipschitz continuous]] with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{Cite journal | last1 = Dunn | first1 = J. C. | last2 = Harshbarger | first2 = S. | doi = 10.1016/0022-247X(78)90137-3 | title = Conditional gradient algorithms with open loop step size rules | journal = Journal of Mathematical Analysis and Applications | volume = 62 | issue = 2 | pages = 432 | year = 1978
The
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
Line 69 ⟶ 70:
*{{cite journal|last=Jaggi|first=Martin|title=Revisiting Frank–Wolfe: Projection-Free Sparse Convex Optimization|journal=Journal of Machine Learning Research: Workshop and Conference Proceedings |volume=28|issue=1|pages=427–435|year= 2013 |url=http://jmlr.csail.mit.edu/proceedings/papers/v28/jaggi13.html}} (Overview paper)
*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The Frank–Wolfe algorithm] description
* {{Cite book | last1=Nocedal | first1=Jorge | last2=Wright | first2=Stephen J. | title=Numerical Optimization | publisher=[[Springer-Verlag]] | ___location=Berlin, New York | edition=2nd | isbn=978-0-387-30303-1 | year=2006
==External links==
*https://conditional-gradients.org/: a survey of Frank–Wolfe algorithms.
*[https://www.youtube.com/watch?v=24e08AX9Eww Marguerite Frank giving a personal account of the history of the algorithm]
|