Revision as of 03:34, 3 June 2014 edit Carriearchdale (talk \| contribs) 12,530 edits m →Fixed point iterative schemes: clean up, typo(s) fixed: penality → penalty using AWB ← Previous edit		Revision as of 22:45, 23 July 2014 edit undo Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 5: Fix CS1 deprecated coauthor parameter errors Next edit →
Line 2: :<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal\|last=Combettes\|first=Patrick L.\|~~coauthors~~author2=Wajs, Valérie R. \|title=Signal Recovering by Proximal Forward-Backward Splitting\|journal=Multiscale Model. Simul.\|year=2005\|volume=4\|issue=4\|pages=1168–1200\|url=http://epubs.siam.org/doi/abs/10.1137/050626090}}</ref><ref name=structSparse>{{cite journal\|last=Mosci\|first=S.\|coauthors=Rosasco, L., Matteo, S., Verri, A., and Villa, S.\|title=Solving Structured Sparsity Regularization with Proximal Methods\|journal=Machine Learning and Knowledge Discovery in Databases\|year=2010\|volume=6322\|pages=418–433}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[#Lasso regularization\|lasso]]) or ''group structure'' (in the case of [[#Exploiting group structure\|group lasso]]). == Relevant background == Line 105: === Group lasso === Group lasso is a generalization of the [[#Lasso regularization\|lasso method]] when features are grouped into disjoint blocks.<ref name=groupLasso>{{cite journal\|last=Yuan\|first=M.\|~~coauthors~~author2=Lin, Y. \|title=Model selection and estimation in regression with grouped variables\|journal=J. R. Stat. Soc. B\|year=2006\|volume=68\|issue=1\|pages=49–67\|doi=10.1111/j.1467-9868.2005.00532.x}}</ref> Suppose the features are grouped into blocks <math>\{w_1,\ldots,w_G\}</math>. Here we take as a regularization penalty :<math>R(w) =\sum_{g=1}^G \\|w_g\\|_2,</math>

Proximal gradient methods for learning: Difference between revisions