Content deleted Content added
Added a scetion about lower bounds |
The direction-finding subproblem and the update rule did not comply with each other. Either x_k +s in D in the subproblem and x_k+1 <-- x_k + \alpha s in the update or s in D in the subproblem and x_k+1 <-- x_k + \alpha (s - x_k) in the update are used. See Jaggi (2013) |
||
(46 intermediate revisions by 31 users not shown) | |||
Line 1:
{{Short description|Optimization algorithm}}
The '''Frank–Wolfe algorithm''' is
==Problem statement==
Suppose <math>\mathcal{D}</math> is a [[compact space|compact]] [[convex set]] in a [[vector space]] and <math>f \colon \mathcal{D} \to \mathbb{R}</math> is a [[convex function|convex]], [[differentiable function|differentiable]] [[real-valued function]]. The Frank–Wolfe algorithm solves the [[optimization problem]]
:Minimize <math> f(\mathbf{x})</math>
:subject to <math> \mathbf{x} \in \mathcal{D}</math>.
==Algorithm==
[[File:Frank-Wolfe
:''Initialization
:'''Step 1.''' ''Direction-finding subproblem
::Minimize <math> \mathbf{s}^T
::Subject to <math> \mathbf{s} \in \mathcal{D}</math>
:''(Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series|Taylor approximation]] of <math>f</math> around <math>\mathbf{x}_k \!</math>
:'''Step 2.''' ''Step size determination
:'''Step 3.''' ''Update
==Properties==
While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a
The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is <math>O(1/k)</math> after ''k'' iterations, so long as the gradient is [[Lipschitz continuity|Lipschitz continuous]] with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{
The
If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming|linear program]].
While the worst-case convergence rate with <math>O(1/k)</math> can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.<ref>{{Cite book|title=Nonlinear Programming|first= Dimitri |last=Bertsekas|year=
==Lower bounds on the solution value, and primal-dual analysis==
Since <math>f</math> is
:<math>
Line 40 ⟶ 41:
</math>
This
:<math>
\begin{align}
f(\mathbf{x}^*) \geq \min_{\mathbf{x} \in D} f(\mathbf{x}) + (\mathbf{y} - \mathbf{x})^T \nabla f(\mathbf{x}) = f(\mathbf{x}) - \mathbf{x}^T \nabla f(\mathbf{x}) + \min_{\mathbf{y} \in D} \mathbf{y}^T \nabla f(\mathbf{x})▼
f(\mathbf{x}^*)
& \ge f(\mathbf{x}) + (\mathbf{x}^* - \mathbf{x})^T \nabla f(\mathbf{x}) \\
&\geq \min_{\mathbf{y} \in D} \left\{ f(\mathbf{x}) + (\mathbf{y} - \mathbf{x})^T \nabla f(\mathbf{x}) \right\} \\
▲
\end{align}
</math>
The latter optimization problem is solved in every iteration of the
:<math>
</math>
Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always <math>l_k \leq f(\mathbf{x}^*) \leq f(\mathbf{x}_k)</math>.
It has been shown that this corresponding [[duality gap]], that is the difference between <math>f(\mathbf{x}_k)</math> and the lower bound <math>l_k</math>, decreases with the same convergence rate, i.e.
<math>
f(\mathbf{x}_k) - l_k = O(1/k) .
</math>
Line 56 ⟶ 68:
==Bibliography==
*{{cite journal|last=Jaggi|first=Martin|title=Revisiting
*[http://www.math.chalmers.se/Math/Grundutb/CTH/tma946/0203/fw_eng.pdf The
* {{Cite book | last1=Nocedal | first1=Jorge | last2=Wright | first2=Stephen J. | title=Numerical Optimization | publisher=[[Springer-Verlag]] | ___location=Berlin, New York | edition=2nd | isbn=978-0-387-30303-1 | year=2006 }}.
==External links==
*https://conditional-gradients.org/: a survey of Frank–Wolfe algorithms.
*[https://www.youtube.com/watch?v=24e08AX9Eww Marguerite Frank giving a personal account of the history of the algorithm]
== See also ==
* [[Proximal
{{Optimization algorithms|convex}}
|