Revision as of 01:27, 23 October 2022 edit Citation bot (talk \| contribs) Bots 5,863,397 edits Add: s2cid. \| Use this bot. Report bugs. \| Suggested by Whoop whoop pull up \| #UCB_webform 506/3485 ← Previous edit		Revision as of 02:08, 19 June 2023 edit undo Citation bot (talk \| contribs) Bots 5,863,397 edits Add: s2cid. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox2 \| #UCB_webform_linked 732/999 Next edit →
Line 80: Convergence of this fixed point scheme is well-studied in the literature<ref name=combettes /><ref name=daubechies /> and is guaranteed under appropriate choice of step size <math>\gamma</math> and loss function (such as the square loss taken here). [[Gradient descent#Extensions\|Accelerated methods]] were introduced by Nesterov in 1983 which improve the rate of convergence under certain regularity assumptions on <math>F</math>.<ref name=nesterov>{{cite journal\|last=Nesterov\|first=Yurii\|title=A method of solving a convex programming problem with convergence rate <math>O(1/k^2)</math>\|journal=Soviet Mathematics - Doklady\|year=1983\|volume=27\|issue=2\|pages=372–376}}</ref> Such methods have been studied extensively in previous years.<ref>{{cite book\|last=Nesterov\|first=Yurii\|title=Introductory Lectures on Convex Optimization\|year=2004\|publisher=Kluwer Academic Publisher}}</ref> For more general learning problems where the proximity operator cannot be computed explicitly for some regularization term <math>R</math>, such fixed point schemes can still be carried out using approximations to both the gradient and the proximity operator.<ref name=bauschke /><ref>{{cite journal\|last=Villa\|first=S.\|author2=Salzo, S. \|author3=Baldassarre, L. \|author4=Verri, A. \|title=Accelerated and inexact forward-backward algorithms\|journal=SIAM J. Optim.\|year=2013\|volume=23\|issue=3\|pages=1607–1633\|doi=10.1137/110844805\|citeseerx=10.1.1.416.3633\|s2cid=11379846 }}</ref> == Practical considerations ==

Proximal gradient methods for learning: Difference between revisions