Frank–Wolfe algorithm

In mathematical optimization, the reduced gradient method of Frank and Wolfe is an iterative method for nonlinear programming. Also known as the Frank–Wolfe algorithm and the convex combination algorithm, the reduced gradient method was proposed by Marguerite Frank and Phil Wolfe in 1956 as an algorithm for solving quadratic programming problems. The method is initialized by solving a feasible solution to the linear constraints. At each iteration, the method takes a descent step in the negative gradient direction, so reducing the objective function; this gradient descent step is "reduced" to maintain remain in the polyhedral feasible region of the linear constraints. Because quadratic programming is a generalization of linear programming, the reduced gradient method is a generalization of Dantzig's simplex algorithm for linear programming.

More generally, the reduced gradient method can be used on nonlinear programming problems in addition to quadratic programming problems. While the method is slower than competing methods and has been abandoned as a general purpose method of nonlinear programming, it remains widely used for specially structured problems of large scale optimization. In particular, the reduced gradient method remains popular and effective for finding approximate minimum–cost flows in transportation networks, which often have hundreds of thousands of nodes.

Problem statement

Minimize

f(\mathbf {x} )={\frac {1}{2}}\mathbf {x} ^{\mathrm {T} }E\mathbf {x} +\mathbf {h} ^{\mathrm {T} }\mathbf {x}

subject to

\mathbf {x} \epsilon \mathbf {P}

.

Where the n×n matrix E is positive semidefinite, h is an n×1 vector, and $\mathbf {P}$ represents a feasible region defined by a mix of linear inequality and equality constraints (for example Ax ≤ b, Cx = d).

Algorithm

Step 1. Initialization. Let $k\leftarrow 0$ and let $x_{k}\!$ be any point in $\mathbf {P}$ .

Step 2. Convergence test. If $\nabla f(\mathbf {x} )={\frac {1}{2}}(E+E^{T})\mathbf {x} +\mathbf {h} =\mathbf {0}$ then Stop, we have found the minimum.

Step 3. Direction-finding subproblem. The approximation of the problem that is obtained by replacing the function f with its first-order Taylor expansion around $x_{k}\!$ is found. Solve for ${\bar {x}}_{k}$ :

Minimize

f(x_{k})+\nabla ^{T}f(x_{k}){\bar {x}}_{k}

Subject to

{\bar {x}}_{k}\epsilon \mathbf {P}

(note that this is a Linear Program.

x_{k}\!

is fixed during Step 3, while the minimization takes place by varying

{\bar {x}}_{k}

and is equivalent to minimization of

\nabla ^{T}f(x_{k}){\bar {x}}_{k}

).

Step 4. Step size determination. Find $\lambda \!$ that minimizes $f(x_{k}+\lambda ({\bar {x}}_{k}-x_{k}))$ subject to $0\leq \lambda \leq 1$ . If $\lambda =0\!$ then Stop, we have found the minimum.

Step 5. Update. Let $x_{k+1}\leftarrow x_{k}+\lambda ({\bar {x}}_{k}-x_{k})$ , let $k\leftarrow k+1$ and go back to Step 2.

Comments

The algorithm generally makes good progress towards the optimum during the first few iterations, but convergence often slows down substantially when close to the minimum point. For this reason the algorithm is perhaps best used to find an approximate solution. It can be shown that the worst case convergence rate is sublinear; however, in practice the convergence rate has been observed to improve in case of many constraints.^[1]

References

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, 3 (1956), pp. 95—110.
The Frank-Wolfe algorithm description
Combined Traffic Signal Control and Traffic Assignment: Algorithms, Implementation and Numerical Results, Chungwon Lee and Randy B. Machemehl, University of Texas at Austin, March 2005

Notes

^ "Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.

[1] "Nonlinear Programming", Dimitri Bertsekas, 2003, page 222. Athena Scientific, ISBN 1-886529-00-0.

[1]