Frank–Wolfe algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:54, 17 December 2022 edit Saung Tadashi (talk \| contribs) Extended confirmed users 2,083 edits →External links Tag: Visual edit ← Previous edit		Latest revision as of 19:37, 11 July 2024 edit undo 129.97.124.26 (talk) The direction-finding subproblem and the update rule did not comply with each other. Either x_k +s in D in the subproblem and x_k+1 <-- x_k + \alpha s in the update or s in D in the subproblem and x_k+1 <-- x_k + \alpha (s - x_k) in the update are used. See Jaggi (2013)
(5 intermediate revisions by 5 users not shown)
Line 15: :'''Step 1.''' ''Direction-finding subproblem:'' Find <math>\mathbf{s}_k</math> solving ::Minimize <math> \mathbf{s}^T \nabla f(\mathbf{x}_k)</math> ::Subject to <math> \mathbf{s} \in \mathcal{D}</math> :''(Interpretation: Minimize the linear approximation of the problem given by the first-order [[Taylor series\|Taylor approximation]] of <math>f</math> around <math>\mathbf{x}_k \!</math> constrained to stay within <math>\mathcal{D}</math>.)'' Line 23: ==Properties== While competing methods such as [[gradient descent]] for constrained optimization require a [[Projection (mathematics)\|projection step]] back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a ~~linear~~convex problem over the same set in each iteration, and automatically stays in the feasible set. The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is <math>O(1/k)</math> after ''k'' iterations, so long as the gradient is [[Lipschitz continuity\|Lipschitz continuous]] with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.<ref>{{Cite journal \| last1 = Dunn \| first1 = J. C. \| last2 = Harshbarger \| first2 = S. \| doi = 10.1016/0022-247X(78)90137-3 \| title = Conditional gradient algorithms with open loop step size rules \| journal = Journal of Mathematical Analysis and Applications \| volume = 62 \| issue = 2 \| pages = 432 \| year = 1978 \| doi-access = free }}</ref> The ~~iterates~~iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization in [[machine learning]] and [[signal processing]] problems,<ref>{{Cite journal \| last1 = Clarkson \| first1 = K. L. \| title = Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm \| doi = 10.1145/1824777.1824783 \| journal = ACM Transactions on Algorithms \| volume = 6 \| issue = 4 \| pages = 1–30 \| year = 2010 \| citeseerx = 10.1.1.145.9299 }}</ref> as well as for example the optimization of [[flow network\|minimum–cost flow]]s in [[Transport network\|transportation network]]s.<ref>{{Cite journal \| last1 = Fukushima \| first1 = M. \| title = A modified Frank-Wolfe algorithm for solving the traffic assignment problem \| doi = 10.1016/0191-2615(84)90029-8 \| journal = Transportation Research Part B: Methodological \| volume = 18 \| issue = 2 \| pages = 169–177\| year = 1984 }}</ref> If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes a [[linear programming\|linear program]].