Revision as of 04:08, 23 January 2014 edit 128.101.152.68 (talk) →Exploiting Group Structure: lower case obviously required by WP:MOS ← Previous edit		Revision as of 04:10, 23 January 2014 edit undo 128.101.152.68 (talk) →Group lasso: proper use of "cases" Next edit →
Line 98: Group lasso is a generalization of the [[#Lasso regularization\|lasso method]] when features are grouped into disjoint blocks.<ref name=groupLasso>{{cite journal\|last=Yuan\|first=M.\|coauthors=Lin, Y.\|title=Model selection and estimation in regression with grouped variables\|journal=J. R. Stat. Soc. B\|year=2006\|volume=68\|issue=1\|pages=49-67\|doi=10.1111/j.1467-9868.2005.00532.x}}</ref> Suppose the features are grouped into blocks <math>\{w_1,\ldots,w_G\}</math>. Here we take as a regularization penalty :<math>R(w) =\sum_{g=1}^G \\|w_g\\|_2,</math> which is the sum of the <math>\ell_2</math> norm on corresponding feature vectors for the different groups. A similar proximity operator analysis as above can be used to compute the proximity operator for this penalty. Where the lasso penalty has a proximity operator which is soft thresholding on each individual component, the proximity operator for the group lasso is soft thresholding on each group. For the group <math>w_g</math> we have that proximity operator of <math>\lambda\gamma\left(\sum_{g=1}^G \\|w_g\\|_2\right) </math> is given by ~~:<math>\widetilde{S}_{\lambda\gamma }(w_g) = \left\{ \begin{array}{rl}~~ ~~w_g-\lambda\gamma~~ :<math>\~~frac~~widetilde{~~w_g~~S}_{~~\\|w_g\\|_2},&\\|w_g\\|_2>~~\lambda\gamma }(w_g) = \\begin{cases} 0w_g-\lambda\gamma \frac{w_g}{\\|w_g\\|_2}, & \\|w_g\\|_2~~\leq~~ >\lambda\gamma \\ 0, & \\|w_g\\|_2\leq \lambda\gamma \end{~~array~~cases}~~\right.~~</math> where <math>w_g</math> is the <math>g</math>th group.

Proximal gradient methods for learning: Difference between revisions