Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Carriearchdale (talk | contribs)
m Fixed point iterative schemes: clean up, typo(s) fixed: penality → penalty using AWB
Monkbot (talk | contribs)
Line 2:
:<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math>
 
Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal|last=Combettes|first=Patrick L.|coauthorsauthor2=Wajs, Valérie R. |title=Signal Recovering by Proximal Forward-Backward Splitting|journal=Multiscale Model. Simul.|year=2005|volume=4|issue=4|pages=1168–1200|url=http://epubs.siam.org/doi/abs/10.1137/050626090}}</ref><ref name=structSparse>{{cite journal|last=Mosci|first=S.|coauthors=Rosasco, L., Matteo, S., Verri, A., and Villa, S.|title=Solving Structured Sparsity Regularization with Proximal Methods|journal=Machine Learning and Knowledge Discovery in Databases|year=2010|volume=6322|pages=418–433}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[#Lasso regularization|lasso]]) or ''group structure'' (in the case of [[#Exploiting group structure|group lasso]]).
 
== Relevant background ==
Line 105:
=== Group lasso ===
 
Group lasso is a generalization of the [[#Lasso regularization|lasso method]] when features are grouped into disjoint blocks.<ref name=groupLasso>{{cite journal|last=Yuan|first=M.|coauthorsauthor2=Lin, Y. |title=Model selection and estimation in regression with grouped variables|journal=J. R. Stat. Soc. B|year=2006|volume=68|issue=1|pages=49–67|doi=10.1111/j.1467-9868.2005.00532.x}}</ref> Suppose the features are grouped into blocks <math>\{w_1,\ldots,w_G\}</math>. Here we take as a regularization penalty
 
:<math>R(w) =\sum_{g=1}^G \|w_g\|_2,</math>