Revision as of 21:30, 2 September 2020 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,587 edits Christine De Mol ← Previous edit		Revision as of 18:54, 21 May 2021 edit undo Citation bot (talk \| contribs) Bots 5,863,311 edits Add: s2cid. \| Use this bot. Report bugs. \| Suggested by SemperIocundus \| #UCB_webform Next edit →
Line 2: :<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal\|last=Combettes\|first=Patrick L.\|author2=Wajs, Valérie R. \|title=Signal Recovering by Proximal Forward-Backward Splitting\|journal=Multiscale Model. Simul.\|year=2005\|volume=4\|issue=4\|pages=1168–1200\|doi=10.1137/050626090\|s2cid=15064954\|url=https://semanticscholar.org/paper/56974187b4d9a8757f4d8a6fd6facc8b4ad08240}}</ref><ref name=structSparse>{{cite journal\|last=Mosci\|first=S.\|author2=Rosasco, L. \|author3=Matteo, S. \|author4=Verri, A. \|author5=Villa, S. \|title=Solving Structured Sparsity Regularization with Proximal Methods\|journal=Machine Learning and Knowledge Discovery in Databases\|year=2010\|volume=6322\|pages=418–433 \|doi=10.1007/978-3-642-15883-4_27\|series=Lecture Notes in Computer Science\|isbn=978-3-642-15882-7\|doi-access=free}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[Lasso (statistics)\|lasso]]) or ''group structure'' (in the case of [[Lasso (statistics)#Group LASSO\|group lasso]]). == Relevant background == Line 105: === Group lasso === Group lasso is a generalization of the [[Lasso (statistics)\|lasso method]] when features are grouped into disjoint blocks.<ref name=groupLasso>{{cite journal\|last=Yuan\|first=M.\|author2=Lin, Y. \|title=Model selection and estimation in regression with grouped variables\|journal=J. R. Stat. Soc. B\|year=2006\|volume=68\|issue=1\|pages=49–67\|doi=10.1111/j.1467-9868.2005.00532.x\|s2cid=6162124\|url=https://semanticscholar.org/paper/d98ef875e2cbde3e2cc8fad521e3cbfe1bddbd69}}</ref> Suppose the features are grouped into blocks <math>\{w_1,\ldots,w_G\}</math>. Here we take as a regularization penalty :<math>R(w) =\sum_{g=1}^G \\|w_g\\|_2,</math>

Proximal gradient methods for learning: Difference between revisions