Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Exploiting Group Structure: lower case obviously required by WP:MOS
Group lasso: proper use of "cases"
Line 98:
 
Group lasso is a generalization of the [[#Lasso regularization|lasso method]] when features are grouped into disjoint blocks.<ref name=groupLasso>{{cite journal|last=Yuan|first=M.|coauthors=Lin, Y.|title=Model selection and estimation in regression with grouped variables|journal=J. R. Stat. Soc. B|year=2006|volume=68|issue=1|pages=49-67|doi=10.1111/j.1467-9868.2005.00532.x}}</ref> Suppose the features are grouped into blocks <math>\{w_1,\ldots,w_G\}</math>. Here we take as a regularization penalty
 
:<math>R(w) =\sum_{g=1}^G \|w_g\|_2,</math>
 
which is the sum of the <math>\ell_2</math> norm on corresponding feature vectors for the different groups. A similar proximity operator analysis as above can be used to compute the proximity operator for this penalty. Where the lasso penalty has a proximity operator which is soft thresholding on each individual component, the proximity operator for the group lasso is soft thresholding on each group. For the group <math>w_g</math> we have that proximity operator of <math>\lambda\gamma\left(\sum_{g=1}^G \|w_g\|_2\right) </math> is given by
 
:<math>\widetilde{S}_{\lambda\gamma }(w_g) = \left\{ \begin{array}{rl}
w_g-\lambda\gamma :<math>\fracwidetilde{w_gS}_{\|w_g\|_2},&\|w_g\|_2>\lambda\gamma }(w_g) = \\begin{cases}
0w_g-\lambda\gamma \frac{w_g}{\|w_g\|_2}, & \|w_g\|_2\leq >\lambda\gamma \\
0, & \|w_g\|_2\leq \lambda\gamma
\end{arraycases}\right.</math>
 
where <math>w_g</math> is the <math>g</math>th group.