Frank–Wolfe algorithm

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956^[1]. In each iteration, the Frank-Wolfe algorithm considers linear approximation of the objective function, and moves slightly towards a minimizer of this linear function (taken over the same ___domain).

While competing methods such as gradient descent for constrained optimization require a projection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a linear-problem over the same set in each iteration, and automatically stays in the feasible set. The convergence of the Frank–Wolfe algorithm is sublinear in general, the error to the optimum is $O(1/k)$ after k iterations. The iterates of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in machine learning and signal processing problems^[2], as well as for example the optimization of minimum–cost flows in transportation networks.

Problem statement

Minimize

f(\mathbf {x} )

subject to

\mathbf {x} \in {\mathcal {D}}

.

Where the function f is convex and differentiable, and the ___domain or feasible set ${\mathcal {D}}$ is convex and bounded.

Algorithm

A step of the Frank-Wolfe algorithm

Initialization. Let $k\leftarrow 0$ and let $\mathbf {x} _{0}\!$ be any point in ${\mathcal {D}}$ .

Step 1. Direction-finding subproblem. Find $\mathbf {s}$ solving

Minimize

\mathbf {s} ^{T}\nabla f(\mathbf {x} _{k})

Subject to

\mathbf {s} \in {\mathcal {D}}

Interpretation: Minimize the linear approximation of the problem given by the first-order Taylor approximation of f around $\mathbf {x} _{k}\!$ .

Step 2. Step size determination. Set $\gamma \leftarrow {\frac {k}{k+2}}$ , or alternatively find $\gamma \!$ that minimizes $f(\mathbf {x} _{k}+\gamma (\mathbf {s} -\mathbf {x} _{k}))$ subject to $0\leq \gamma \leq 1$ .

Step 3. Update. Let $\mathbf {x} _{k+1}\leftarrow \mathbf {x} _{k}+\gamma (\mathbf {s} -\mathbf {x} _{k})$ , let $k\leftarrow k+1$ and go back to Step 1.

Comments

If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a linear program.

The algorithm generally makes good progress towards the optimum during the first few iterations, but convergence often slows down substantially when close to the minimum point. For this reason the algorithm is perhaps best used to find an approximate solution. It can be shown that the worst case convergence rate is sublinear; however, in practice the convergence rate has been observed to improve in case of many constraints.^[3]

References

The Frank-Wolfe algorithm description
Combined Traffic Signal Control and Traffic Assignment: Algorithms, Implementation and Numerical Results, Chungwon Lee and Randy B. Machemehl, University of Texas at Austin, March 2005

Notes

^ Frank, M.; Wolfe, P. (1956). "An algorithm for quadratic programming". Naval Research Logistics Quarterly. 3 (1–2): 95–110. doi:10.1002/nav.3800030109.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1824777.1824783, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1824777.1824783 instead.
^ "Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.

[1] Frank, M.; Wolfe, P. (1956). "An algorithm for quadratic programming". Naval Research Logistics Quarterly. 3 (1–2): 95–110. doi:10.1002/nav.3800030109.

[2] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1824777.1824783, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1824777.1824783 instead.

[3] "Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.

[1]

[2]

[3]