Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: s2cid. | Use this bot. Report bugs. | Suggested by Whoop whoop pull up | #UCB_webform 506/3485
Citation bot (talk | contribs)
Add: s2cid. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox2 | #UCB_webform_linked 732/999
Line 80:
 
Convergence of this fixed point scheme is well-studied in the literature<ref name=combettes /><ref name=daubechies /> and is guaranteed under appropriate choice of step size <math>\gamma</math> and loss function (such as the square loss taken here). [[Gradient descent#Extensions|Accelerated methods]] were introduced by Nesterov in 1983 which improve the rate of convergence under certain regularity assumptions on <math>F</math>.<ref name=nesterov>{{cite journal|last=Nesterov|first=Yurii|title=A method of solving a convex programming problem with convergence rate <math>O(1/k^2)</math>|journal=Soviet Mathematics - Doklady|year=1983|volume=27|issue=2|pages=372–376}}</ref> Such methods have been studied extensively in previous years.<ref>{{cite book|last=Nesterov|first=Yurii|title=Introductory Lectures on Convex Optimization|year=2004|publisher=Kluwer Academic Publisher}}</ref>
For more general learning problems where the proximity operator cannot be computed explicitly for some regularization term <math>R</math>, such fixed point schemes can still be carried out using approximations to both the gradient and the proximity operator.<ref name=bauschke /><ref>{{cite journal|last=Villa|first=S.|author2=Salzo, S. |author3=Baldassarre, L. |author4=Verri, A. |title=Accelerated and inexact forward-backward algorithms|journal=SIAM J. Optim.|year=2013|volume=23|issue=3|pages=1607–1633|doi=10.1137/110844805|citeseerx=10.1.1.416.3633|s2cid=11379846 }}</ref>
 
== Practical considerations ==